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Princeton University 

We consider a discrete time hidden Markov model where the sig- 
nal is a stationary Markov chain. When conditioned on the observa- 
tions, the signal is a Markov chain in a random environment under 
the conditional measure. It is shown that this conditional signal is 
weakly ergodic when the signal is ergodic and the observations are 
nondegenerate. This permits a delicate exchange of the intersection 
and supremum of er-fields, which is key for the stability of the non- 
linear filter and partially resolves a long-standing gap in the proof of 
a result of Kunita [J. Multivariate Anal. 1 (1971) 365-393]. A similar 
result is obtained also in the continuous time setting. The proofs are 
based on an ergodic theorem for Markov chains in random environ- 
ments in a general state space. 

1. Introduction. Consider a discrete time Markov chain (X n ) n ^z + and 
a random process (Y n ) n£ x + such that Y n and Y m (n ^ m) are conditionally 
independent given (X n ) ng z + and such that the conditional distribution of 
Y n given (X n ) n&+ depends only on X n . Then the pair (X n ,Y n ) neZ+ defines 
a hidden Markov model, where the observation process (Y n ) n ^z + provides 
indirect information on the signal process (X n ) n ^z + - Models of this form 
have a wide array of applications in statistics, engineering and finance, and 
possess a rich theory of statistical inference [7]. Of particular interest in the 
present paper is the filtering problem, which aims to estimate the current 
state X n of the signal given the observation history (^fc)o<fc<n 

by computing 

the regular conditional probability P(X„ £ •|(5 / fc)o<fc<n)- A similar class of 
problems can also be formulated in continuous time. 

This paper is concerned with the long time properties of the nonlinear 
filter, that is, we are interested in the behavior of the regular conditional 
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probabilities II n = P(X n £ -KY^CKfc^n) as n — > oo, in the case that the sig- 
nal possesses an invariant probability measure tt. The investigation of such 
problems in general hidden Markov models has a long history, starting with 
the pioneering work of Kunita [23] (in the continuous time setting) on the 
stationary behavior of the mean square estimation error of the nonlinear 
filter. To study this problem, he established the following key result [23], 
Theorem 3.3: for any invariant measure tt of the signal, the filtering process 
(n n ) ng ^ + possesses a unique invariant measure with barycenter tt if and 
only if the signal is ergodic in a particular sense (see below). 

A different but closely related problem of interest is the stability of non- 
linear filters. Denote by the law of (X n , Y n ) n< zj d+ with the initial law 
A"o ~ /i, and write the corresponding filter as 11^ = P M (X n S -\(Yk)o<k<n)- 
In practice, the initial measure /i (the Bayesian prior) is rarely known pre- 
cisely, and it is thus highly desirable that the filter H"^ becomes insensitive 
to the choice of (i as n — > co (e.g., as in Theorem 5.2 below). When this is 
the case, the filter is said to be stable. In a pioneering paper, Ocone and 
Pardoux [25] used Kunita's theorem to establish that stability of the filter 
is inherited from the ergodicity of the signal process. 

The asymptotic properties of nonlinear filters have received considerable 
attention in recent years (see, e.g., [12] and the references therein). Beside 
the fundamental interest of the topic, results in this direction have a variety 
of applications, which include uniform convergence of filter approximations 
[5, 6, 13, 14], maximum likelihood estimation [7, 8, 19], stochastic control 
[18, 31] and estimation error bounds [3, 23]. In various specific cases one can 
even obtain detailed quantitative information about the rate of stability of 
the filter (see [12] for references). In the general setting, however, little is 
known about the asymptotic properties of nonlinear filters beyond the work 
of Kunita [23] and subsequent papers, such as [25], which rely directly on 
the approach of [23] (but see [36]). 

Unfortunately, as was pointed out in [1], there is a serious gap in the 
proof of the main result in [23]. To describe the problem, let us suppose 
that the signal process possesses an invariant probability measure tt. Then 
P 71 " is a stationary measure, and we can therefore extend the stationary hid- 
den Markov model to two-sided time (X n ,Y n ) ne z by a standard argument. 
Denote by P the extension of P^ to two-sided time, and define the a-fields 
Tf = a{X n :nel} and Tj = a{Y n :n£l} (J C Z). The key step in Ku- 
nita's proof is to argue that his result would follow if we could establish that 
the following identity holds true: 

f] F]- oo,0] V ^y-oo-n] = -^i-00,0] P-a.S. 
n>0 
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He proceeds to argue as follows. Suppose that the signal satisfies the follow- 
ing ergodicity condition: f] n>0 Fy.^ _„i is P-a.s. trivial. Then 

f] "^j-oo,0] V F hoc ,-n] = -^i-oo.O] V fl F ht»,-n] = ^-00,0] P-a.S. 
n>0 n>0 

The exchange of the intersection and supremum of cr-fields is not at all 
obvious, however, and no proof of this assertion is provided in [23]. Indeed, 
this exchange is not permitted in general, as an illuminating counterexample 
in [1] shows. 

It is important to note, on the other hand, that all known counterexam- 
ples rely in an essential way on the degeneracy of the observation model, 
that is, Yfc = h{Xk) for some function h without any additional noise. It is 
therefore tempting to conjecture that the exchange of intersection and supre- 
mum is always permitted provided that the observations are nondegenerate, 
which is most naturally imposed in our general setting by requiring that the 
conditional law of Y n given (X k )kez satisfies 

P(Y n eA\(X k ) keZ ) = J I A (du)g(X n ,u)tp(du) P-a.s., 

where ip is a fixed reference measure and g is a strictly positive function. 
Though no counterexamples are known, it is unclear whether or not this is 
the case, and the (positive or negative) verification of this conjecture remains 
an open problem. 

From the work of Budhiraja [4] and of Baxendale, Chigansky and Liptser 
[1], and from the results of Section 5 below, it is clear that Kunita's exchange 
of intersection and supremum and its time-reversed cousin 



Pi -^j-oo.O] V -^j-oo ,-n] - 



oo,0] 



n>0 

and 



fl V "^>,oo[ — -^[0,oo[ P-a.s 



n>0 

lie at the heart of the qualitative asymptotic theory of nonlinear filtering. 
The main result of this paper, Theorem 4.2, establishes that both these 
identities do indeed hold under conditions that are only mildly stronger 
than those assumed by Kunita. Given an invariant probability measure tt of 
the signal process, we assume the following: 

1. The signal is ergo die in the following sense: 

HP^ 2 (X n € •) — 7t||tv ^> for?r-a.e. z, 

where || • ||tv is the total variation norm (Assumption 3.1 below). 
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2. The observations are nondegenerate (Assumption 3.2 below). 

These assumptions are satisfied by the vast majority of stationary hidden 
Markov models of practical interest, including the important case of aperi- 
odic and positive Harris recurrent signals with nondegenerate observations. 
Note that we do not require the Feller assumption, and that we allow for sig- 
nal and observation processes with arbitrary Polish state spaces (the Polish 
assumption guarantees an abundance of regular conditional probabilities). 
The latter has the additional advantage that our results extend directly to 
the continuous time setting (Section 6). 

Beside our main result, this paper contains two additional results which 
are of independent interest. First, as we will discuss shortly, the proof of 
our main result hinges on the ergodic theory of Markov chains in random 
environments as developed by Cogburn [10, 11] and Orey [26] for countable 
state spaces. In Section 2, we prove the counterpart of a result from [11] 
for Markov chains in random environments on general Polish state spaces 
(Theorem 2.3). This result is not specific to hidden Markov models, and 
could be relevant in other settings. 

Second, we will show in Section 5 that the permissibility of the exchange 
of intersection and supremum leads to the stability of the nonlinear filter in 
a much stronger sense than was previously established in [1, 4, 25]. A special 
case of our main stability theorem (Theorem 5.2) is the following result: if 
the signal is aperiodic and positive Harris recurrent, and if the observations 
are nondegenerate, then 

Hn^-n^KTV^^O P 7 -a.s. for all fi,u,j. 

Similar results hold in the continuous time setting (Section 6). 

The remainder of this section is devoted to a guided tour through our 
proofs. 

1.1. The method of von Weizsacker and the conditional signal. In [37], 
von Weizsacker has studied the exchange of intersection and supremum prob- 
lem in a general setting. Following his approach, one can establish the fol- 
lowing illuminating result. Let Q n , n £ N be a decreasing family of countably 
generated c-fields and let T be another countably generated d-field. Then 

^\TMQ n = T P- a .s. iff p| Q n is P (id, -)-a.s. trivial for P-a.e. to, 

n€N neN 

where P^{u), •) is a version of the regular conditional probability P{-\!F). It 
would appear at first glance that P-a.s. triviality of the tail (7-field p| n £j^ Qn 
automatically implies that it is also P(-|J r )-a.s. trivial; after all, it is elemen- 
tary that P{A\T) = P(A) P-a.s. whenever P(A) = or P(A) = 1. However, 
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the tail u-field is not countably generated, so we cannot eliminate the de- 
pendence of the exceptional set on A. Verification of P(-|.F)-a.s. triviality is 
thus a nontrivial problem. 

Despite its generality, the result of von Weizsacker is rarely used in the 
literature. In many cases the result is difficult to apply, as a tractable charac- 
terization of the conditional measure P(-|.F) is typically not available. In our 
setting, however, a fortuitous observation makes this approach much more 
attractive: when conditioned on the observations, the signal process remains 
an (albeit nonhomogeneous) Markov process whose transition probabilities 
depend on the observed sample path of the observation process. This obser- 
vation dates back to the work of Stratonovich [33], and has recently been 
applied to obtain quantitative stability results for various special filtering 
models [7, 20, 35]. In these references a time horizon iV is fixed and the 
signal is considered under the conditional measure P(-|^ r ^ Ar j), while we will 

work under the conditional measure P(-|.Fp r), but this difference does not 
affect the Markov property of the conditional signal. 

Our basic strategy is thus as follows. Note that by the above discussion 

fl ^%,oo[ v ^,oo[ = - :F [o,oo[ P-a.s. 

n>0 

would be established if we could show that 

T x = f] is P(-|J r J oo[ )-a.s. trivial P-a.s. 

n>0 

We therefore aim to show that the signal (X n ) n >o, which is a nonhomoge- 
neous Markov process under the regular conditional probability P(-|J 7 ^ r), 

has trivial tail <7-field T x for almost every observation path, provided our 
ergodicity and nondegeneracy assumptions are satisfied. The time-reversed 
result follows similarly. 

1.2. Markov chains in random environments. To obtain our main result, 
we must now show that tail triviality of the signal process under the condi- 
tional measure is inherited from the ergodicity of the signal process under 
the original probability measure. In the following, we will often refer to the 
signal process under the conditional measure as the conditional signal. 

To fix some ideas, consider the case of a time homogeneous finite state 
Markov chain. In this setting, ergodicity (and hence tail triviality) is de- 
termined entirely by the graph of the chain, and not by the precise values 
of the transition probabilities. In particular, for one such chain to inherit 
ergodicity from another chain, it suffices that the two chains have the same 
graph, or, in probabilistic terms, that their transition probabilities are mu- 
tually absolutely continuous. That a similar statement holds in a general 
state space can be inferred, for example, from [28], Theorem 2.1. 
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The problem in our setting is that the conditional signal is not time ho- 
mogeneous. Nonetheless, the transition probability of the conditional signal 
K n {x, ■) = P(X n G -\X n _i = XjJ-'X^A satisfies a key homogeneity property: 

it is easily seen [using the stationarity of P and the Markov property of 
(X n ,Y n ) n £z\ that n \— > K n is a stationary stochastic process. The conditional 
signal is thus a Markov chain in a random environment in the terminology 
of Cogburn, who established [11], Section 3, that the ergodicity of such a 
process in a finite (or countable) state space is determined by its graph in 
essentially the same manner as for time homogeneous chains. This suggests 
that to prove our result, it suffices to show that the transition probabilities 
of the conditional signal and of the signal are equivalent. 

As is perhaps to be expected, things are not quite so straightforward 
in practice. First, even in a finite state space, the conditional signal under 
P(-|jT^ oo j) does not fit in the framework of Cogburn as the ergodic theory 
of Markov chains in random environments relies on the availability of all 
environmental variables (Yk)kez- In order to apply the result of Cogburn, 
we must therefore condition not on ^r^^r but on T\ . It is then necessary 
to establish two things: that 

P{X n G -|X n _ a = ~ P(X n € -\X n -i = i) for all % P-a.s., 

so that the ergodicity of the signal process under P implies the ergodicity 
of the signal process under P(-|J r ^') by the result of Cogburn, and that 

P((^n)n>0 G ~ P((X n ) n > € -\T[ 0M ) P-a.S., 

so that triviality of T x under P(-|^') implies triviality of T x under 
^("I*mo «>[)■ ^ e wm P rove these identities in Sections 3 and 4 using a cou- 
pling argument; it is here that the nondegeneracy of the observations is re- 
quired. Once these facts have been established, von Weizsacker's argument 
completes the proof. 

Unlike Cogburn's results, however, our results are not restricted to fi- 
nite or countable state spaces. Our first order of business is therefore to 
extend the necessary result from [11] to the setting of general Polish state 
spaces. As with ordinary Markov chains in general state spaces, the general 
case requires significantly more sophisticated tools than are needed in the 
countable setting. Our general result in Section 2 is inspired by the elegant 
martingale methods of Derriennic [16] and of Papangelou [28] for ordinary 
Markov chains in general state spaces. 

1.3. Organization of the paper. This paper is organized as follows. 

In Section 2, we introduce the general model for a Markov chain in a ran- 
dom environment. The main result, Theorem 2.3, establishes that weak er- 
godicity, tail triviality and irreducibility are equivalent for stationary Markov 
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chains in random environments. This result is key for the proof of our main 
result. 

In Section 3, we introduce the general hidden Markov model. We begin 
by proving that this model fits in the framework of Section 2 if we condition 
on the complete observation record {Ynjn&z (Lemma 3.3). The main result 
of this section, Theorem 3.4, establishes that the conditional signal is er- 
godic provided that the ergodicity and nondegeneracy Assumptions 3.1 and 
3.2 are satisfied. The proof proceeds in two steps. First, we show that the 
result would follow from ergodicity of the signal and the equivalence of the 
conditional and unconditional transition probabilities (Lemma 3.5). Next, 
we show that this equivalence does in fact hold if we additionally assume 
nondegenerate observations (Lemma 3.8). Of independent interest is Lemma 
3.7, which is used repeatedly in the following sections. 

In Section 4, we complete the proof of the main result of this paper (The- 
orem 4.2). First, we develop the argument of von Weizsacker in our set- 
ting (Section 4.1). The remainder of the section is devoted to proving that 
P(pf n ) n > G -\J^£) ~ P((A A ri ) n > G -l^ooj) P-a.s. (the relevance of which 
was discussed above). 

Section 5 establishes that our main result implies stability of the filter 
(Theorem 5.2). The key connection between Theorems 5.2 and 4.2 is the 
expression in Lemma 5.6 for the Radon-Nikodym derivative between differ- 
ently initialized filters. 

In Section 6, we extend our main results to the continuous time setting. 

Finally, Section 7 contains a brief discussion on the implications of our 
main result for the gap in the result of Kunita [23] . 



2. Markov chains in random environments. 



2.1. The canonical setup and main result. Throughout this paper, we 
operate in the following canonical setup. We consider the pair (X n} Y n )neZ> 
where X n takes values in the Polish space E and Y n takes values in the Polish 
space F. We realize these processes on the canonical path space Q. = Q x Q Y 
with tt x = E z and £l Y = F z , such that X n (x,y) = x(n) and Y n (x,y) = y{n). 
Denote by J- the Borel cr-field on $7, and introduce the natural nitrations 

T* = a{X k :k<n}, J* = a{Y k :k <n}, T n = T*VTl 
for n G Z, as well as the cr-fields 

Tf = a{X k :keI}, Tj = a{Y k :k e I}, Ti = Tf\lTj 
for IcZ. For simplicity of notation, we set 

tX _ t~X t~Y _ t-K <r~X _ rX ^rY _ t~Y 

J~ — -r% J J~ "^Z. S + — «f[0 )OO [> ^+—^[0,001 
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and we will denote by Y the F -valued random variable (Ife)fcgz- The canon- 
ical shift G : Vl — ► VL is defined as Q(x, y)(m) = (x(m + l),y(m + 1)). 

In the following sections we will introduce a measure on (O, J-) which 
defines a hidden Markov model. In the present section, however, it will 
be more convenient to attach a somewhat different interpretation to our 
canonical setup. To this end, consider a probability kernel of the form 
P X :E x n Y x B(E) -» [0,1], where B(E) denotes the Borel a-field of E. 
We will define a stationary probability measure P on (O, J-) such that the 
following holds a.s. for every n£Z: 

P(X n+ i G A\T X V T Y ) = P x (X n ,Y o 6 n , A). 

Then X n is interpreted as a Markov chain in a random environment: the en- 
vironment is the sequence Y, and X n is a nonhomogeneous Markov process, 
for almost every path Y, under the regular conditional probability P(-\J- Y ). 

Remark 2.1. Markov chains in random environments were studied ex- 
tensively by Cogburn [10, 11] and by Orey [26] in the case that E is count- 
able. The purpose of this section is to extend a result in [11] to the general 
setting in which E is Polish. It should be noted that in these papers the 
kernel P x (x,y,A) is assumed to depend only on y(0), rather than on the 
entire path y = (y(k))kez- This difference is immaterial, however, and the 
current notation fits particularly well with the hidden Markov model which 
will be studied in the rest of the paper. 

We proceed to construct P. Our model consists of three ingredients: 

1. The probability kernel P x : E x n Y x B(E) -» [0, 1]. 

2. A probability kernel \i : f2 y x B(E) —> [0, 1] such that 

J P X {z, y, A)n(y, dz) = n(@y, A) for all y G Q Y , A € B{E). 

3. A probability measure P Y on (Q Y ,!F Y ) which is invariant under the shift, 
that is, P Y (Y G A) = P Y (Y oQeA) for all A G T Y . 

For every n G N, define the probability kernel P^ : £l Y x J~^_ n re i — > [0, 1] as 
P(">(A) = J I A (x)P X (x(n - l),G n_1 y, dx{n)) • • • 

x P x (x{-n), @- n y, dx(-n + l))fi(Q- n y, dx(-n)). 
Then P^ + \jrx = Py 1 , so that we can define a probability kernel 

[ — n,n] 

P. : ^ y x T x -» [0, 1] , P y y x = P[ n) for all n, y 

[— n,n] * 
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by the usual Kolmogorov extension argument. We now define the probability 
measure P on (£l,J-) by setting 

P(A) = J I A {x,y)P y {dx)P Y {dy) for all A G T. 

In addition to the probability measure P and the kernel P y , we introduce a 
probability kernel P v :ExQ Y x T x — > [0, 1] by setting for A G T X n \ 

P z , y (A)= J I A (x)P x (x(n-l),e n - 1 y,dx(n))--- 

x P x (x(l), Sy, dx(2))P x (x(0),y, dx(l))5 z (dx(0)), 

where S Z (A) = Ia(z), and again extending by the Kolmogorov extension 
argument. The following is an easy consequence of our definitions. 

Lemma 2.2. The following properties hold true: 

1. The following holds for all A £ T x , z G E, y G : 

E^o8) = J P x (z,y,dz')P z ,,Q y (A). 

2. Pe„(A) = E„(Ja°©) for allyen Y , A^T X . 

3. P is invariant under the shift ® — > £1, that is, P((Xfc, Yfc)fc g z G A) = 
P((X k+n ,Y k+n ) k& G A) for allA£F,n£ Z. 

4. T/ie following hold P-a.s. for AeT x , BeT x , n G 

E(J A o e"|^ y ) = Py o9 » (A), E(/ fl o G n |^ V = P x „,roe» (#)• 
Proof. Elementary. □ 

The goal of this section is to prove the following theorem. In the case that 
E is countable, a similar result can be found in [11], Section 3. 

Theorem 2.3. The following are equivalent. 

1. \\P z , y {X n G .)-P^(X„G-)||tv^^O for (fi®v)P Y -a.e. (z,z>,y). 

2. The tail a -field T x = Hn>o ^"jn oo[ * s a,s ' trivial in the following sense: 

P z , y (A) = P z , y (A) 2 = P z >,y(A) for all A G T x and y) G if, 

where H is a fixed set (independent of A) of (/i <g) ^)P Y -full measure. 

3. For (/t®/i)P y -a.e. (z,z',y), there is ann^N such that the measures 
P Z) y(X n G •) and P z / j j / (X n G •) are noi mutually singular. 

When the first condition of this theorem holds, the Markov chain in the 
random environment is said to be weakly ergodic; when the second condition 
holds, it is said to be tail trivial; and when the last condition holds, it is 
said to be irreducible. Our goal is to prove that these notions are equivalent. 
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2.2. Proof of Theorem 2.3. The implication 1 => 3 of Theorem 2.3 is 
trivial; thus, it suffices to show that 2 => 1 and 3 => 1,2. Our approach 
below is partially inspired by the martingale methods of Derriennic [16] 
and of Papangelou [28] for ordinary Markov chains in general state spaces, 
and by the work of Cogburn [11] for countable Markov chains in random 
environments. 

We begin by stating two preliminary lemmas which are in essence well- 
known results. The first lemma below shows that the total variation norm 
of a kernel is a measurable function; the second lemma shows that 2 1 in 
Theorem 2.3. 

Lemma 2.4. Let (G,Q) be a measurable space, (K,IC) be a measurable 
space with fC a countably generated a -field, and />:Gx/C-*l be a finite 
kernel. Then the map \\p(g, OIItv is measurable. 

Proof. As /C is countably generated, there is a sequence {I n } of refining 
partitions I n = {E±, . . . , E™} of K such that K. = a{I n : n G N}. But then 

n 

zZ \p(9> E k)\ = \\p(9r)\a{i n }\\TV S \\p(9r)\hv asn^oo 
k=l 

for all g G G (see, e.g., [27], page 1635). As g i— > p(g,E%) is measurable for 
every k,n, the above limit is also measurable and the result follows. □ 

The proof of the following result follows closely along the lines of the proof 
of [29], Proposition 6.2.4, and is therefore omitted. 

Lemma 2.5. Let H be a set of (/x<8> p)P Y -full measure. If 
P z , y (A) = P z , y (A) 2 = P Z ', V (A) for all A G T x and (z, z' , y) £ H, 

then \\P z>y (X n G •) — ~P z > !y (X n € -)||tv n '^ QC > for all (z,z',y) G H. In par- 
ticular, if condition 1 of Theorem 2.3 holds, then so does condition 1. 

Before we proceed, we state an additional lemma on general Markov 
chains which will be used several times. The construction of the set H be- 
low follows closely along the lines of [27], pages 1636-1637, so the proof is 
omitted. 

Lemma 2.6. Let P z be the law of a Markov process (^fc)fc>o given Zq = 
z, and let v be a stationary probability for this Markov process. Then for 
any set H of v- full measure, there is a subset H C H of v-full measure such 
that 



P z (Z n eH for alln>0) = l forallzGH. 
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We now proceed with the proof of Theorem 2.3. Let us introduce cer- 
tain skew Markov chains which will be useful in what follows. Define U n = 
(X n ,Y o n ); then evidently U n is an E x f2 y -valued stationary Markov 
chain under P, whose stationary measure X(A) = ~P(U n G A) for all n G Z, 
A G B(E x Q Y ) and transition probability kernel P u : E x Q Y x B(E xQ Y )^ 
[0, 1] are given by 

X(A) = J I A (z,y)fi(y,dz)P Y (dy), P u (z,y, B xC) = P x \z,y, B)I c (@y), 

while U n is a Markov process with the same transition probability kernel P u 
but with the initial measures S z>y and •) under P z , y and P y , respectively, 
In addition to this skew Markov chain, it will be convenient to construct 
a coupling of two copies U n = (X n ,Y o n ) and U' n = (X' n ,Y' o G n ) of the 
skew chain such that Y = Y' . To construct such a coupling, we define an 
E x E x J7 y -valued Markov process V n = (X n ,X' n ,Y o 9 n ) with transition 
probability kernel 

P v (z, z',y,BxCxD) = P x (z, y, B)P x (z', y, C)I D {@y). 

Note that the probability measure on E x E x Q Y , 

X(A)= J I A (z,z',y) f i(y,dz) f i(y,dz>)-p Y (dy) = ((fi® fi)P Y )(A), 

is an invariant measure for the transition probability P v . We will construct 
in the usual way a probability kernel Q V) . :ExEx £1 Y x B(E xEx Y ) Z + — > 
[0, 1] such that Q z ,z',y is l aw °f (Vn)n>o with Vq ~ $z,z',y Note that under 
Qz,z',y, the processes (X n ) n >o and (X' n ) n >o are independent and their laws 
coincide with the law of (X n ) n >o under P z ,y and P z ',y> respectively. 
Define the sequence of measurable functions 

P n (z,z',y) = \\P Z) y(X n G •) -P zl>y (X n G -)||tv, n G N. 

Note that (3 n is nonincreasing with n, so that (3(z, z' , y) = limn^oo (3 n (z, z' , y) 
is well defined and measurable. We wish to prove that condition 3 of The- 
orem 2.3 implies that (3(z,z',y) = (/j, ® /i)P y -a.e. We will do this in two 
steps. First, following Derriennic [16] (see also Ornstein and Sucheston [27]), 
we prove a zero-two law for (3(z,z',y) which asserts that either conditions 1 
and 2 of Theorem 2.3 hold, or else f3(z,z',y) attains values arbitrarily close 
to 2. In the second step, we will show that condition 3 of Theorem 2.3 rules 
out the latter possibility. 

Proposition 2.7 (Zero-two law). Let H be a given set of (/i® /i)P Y - 
full measure. Then one or the other of the following possibilities must hold 
true: 
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1. Condition 2 of Theorem 2.3 holds for a subset H C H of {pi® fi)P -full 
measure, and f3(z, z' ,y) = for all (z,z',y) € H. 

2. There is an y G Q Y such that the following holds: for any £ > 0, there is 
a (z, z' , y') G H with y' = Q n y for some n G N and (3(z, z', y')>2 — e. 

Proof. Let H C H be the subset constructed through Lemma 2.6. It 
suffices to show that if condition 2 of Theorem 2.3 does not hold on H, then 
the second possibility in the statement of the current proposition must hold 
true. Indeed, if condition 2 of Theorem 2.3 does hold on H, then f3(z, z' , y) = 
for all (z, z', y) G H by Lemma 2.5 and, thus, the first possibility holds true. 

We suppose, therefore, that condition 2 of Theorem 2.3 does not hold on 
H . Then we may clearly choose a (z, z f , y) G H and an A G T x such that we 
have either P z ^ y {A) ^ ~P z / >y (A) or < P z .y{A) < 1. Let us now define 

Z = 2I A -1, g n {z)=^®n y {Z oQ- n ) forallzG-E. 

Using the first property of Lemma 2.2, it is not difficult to establish that 

g n {z) = E St Qn y (g n+k (X k )) for all z G E, k > 0, 

and that 

g n (X n ) = E ii j / (Z|^ n] ) Pi, r a.s. for every z G E. 

In particular, g n (X n ) — > Z Pf^-a.s. for every z G E by martingale conver- 
gence, and this implies for any < e < 2 and z G E that 

Pz,y(g n (X n ) >l-e) P», V (A), 

Pz, y {g n {X n ) < -1 + e) 1 - P~ z , y {A). 

We now proceed as follows. Note that for any < e < 2, 

Qz,z', y (9n(X n ) > 1 - e/2 and < -1 + e/2) 

= P z>y (g n (X n ) > 1 - e/2)P z / i2/ ( 5re (X n ) < -1 + e/2), 

which converges as n — > oo to P Zty (A)(l — P z \ y (A)), and similarly, 

Qz,z>,y(9n(K) > 1 - e/2 and 3n(^n) < -1 + e/2) 

= P 2 , j2/ ( 5re (X re ) > 1 - e/2)P z ^ y {g n {X n ) < -1 + e/2), 

which converges as n — ► oo to P z / jy (A)(l — P z>y (A)). But as either P Zjy (A) ^ 
P z ',y(A) or < P z , y {A) < 1, at least one of these expressions must be posi- 
tive. Hence, for every < e < 2, we can find an n G N such that 

Qz,z',y(\9n(Xn) ~ 9n(X' n )\ > 2 - e) > 0. 
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In particular, there must then be a choice of (z,z' ,Q n y) € H such that we 
have \g n (z) ~ 9n(z')\ > 2 — e. It remains to note that, for all k > 0, 

(3 k (z,z',G n y)= sup |E 2je ™ 2/ (/(X fe ))-E F) e«j,(/(X fe ))| 
ll/ll«<i 

> \^z,e n y{9n+k(Xk)) —~Ez',@ n y{9n+k(Xk))\ 

= \9n{z) -g n {z')\ >2-e, 

so that fi{z,z' ,Q n y) > 2 — e. But we can repeat this procedure for any < 
e < 2, and this establishes that the second possibility of the proposition 
holds. □ 

It remains to argue that condition 3 of Theorem 2.3 rules out the second 
possibility of the zero-two law. We will need the following lemma. 

Lemma 2.8. The following holds for all (z,z' ,y) € E x E x J7 y : 

fi n+1 (z,z',y) < (P v fi n )(z,z f ,y) = Jp n (z,z',y)P v (z,z',y,dz,dz',dy). 

In particular, fi{z,z' ,y) < (P v (3)(z,z',y). 

Proof. Choose sets EZ as in Lemma 2.4, and define 

n 

ffl(z, z',y) = ]T \P g , y (X e € E n k ) - P^yiXt € E%)\. 
k=l 

Then /3f f fit as n — > oo. But /3™ +1 < P v Pf follows from Jensen's inequality 
and Lemma 2.2, so that f3t+\ < P v fit follows by monotone convergence. 
Letting I — > oo, we obtain fi < P v fi by dominated convergence. □ 

The following result now essentially completes the proof. 

Proposition 2.9. Suppose that condition 3 of Theorem 2.3 holds. Then 
there is a set H of (fi <g> ^)P Y -full measure such that fi{z, z' , y) = fi(z, z! , y) < 
2 for every (z, z' , y), {z, z! , y) G H with y = Q n y for some n > 0. 

Proof. Denote by Q the law of iy n )n>o with initial measure A = (/i ® 
/i)P y . By the previous lemma, fi(V n ) is a bounded submartingale under 
Q and, hence, {P{V n )} is a Cauchy sequence in -L 1 (Q) by the martingale 
convergence theorem. But then, using the stationarity of Q, we find that 

Eq|/3(V ) - H(Vn)\ = Eq 1/304) " P(V n+k )\ for all n G N. 
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In particular, we evidently have 

/ Qz,z',yW( V o) = P(V n ) for all n)\{dz,dz' \dy) = 1 

and there is consequently a set H\ of A-full measure such that 

Qz,z', y (P(z,z',y) = (3(V n ) foralln) = l for all (z,z',y)eH 1 . 

By condition 3 of Theorem 2.3, we may choose another set H2 of A-full 
measure such that for every (z,z,y) G H2, there is an n G N such that 
P zy (X n G •) and P Zyy {X n E •) are not mutually singular. Note that the lat- 
ter implies that P Z) y(X m E •) and P Z) y(X m E •) are not mutually singular for 
every m > n, as P Zjy {X n E •) _L P Z) y(X n E •) is equivalent to (3 n (z,z,y) = 2 
and P m (z,z,y) is nonincreasing with m. Now define the set 

H 3 = {{z,z',z,z',y):(z,z',y),(z,z',y) E H u (z,z,y),(z' ,z' ,y) G H 2 }. 

Then it is easily seen that H3 has (/i ® /jl ® // ® /i)P y -full measure. 

We claim that (3(z, z' ,y) = f3(z,z' ,y) whenever (z,z' ,z,z' ,y) £ H3. To 
see this, fix such a point, and choose n E N such that P z ^ y (X n G •) and 
P5,y(^n € ■) are not mutually singular and P a / i2/ (Jf n G •) and Pg/ j2/ (X n G •) 
are not mutually singular. This implies, in particular, that Q z z i y (V n G •) and 
Qz,z',y(Vn £ ') are not mutually singular. But these measures are supported, 
respectively, on the sets 

Si = {(C, C', Q n y) : 0(z, z', y) = P(C, (', @ n y)}, 

z 2 = {(C, C, e n y) ■■ P(~z, z, y) = P(C, C, ® n y)} 

as (z,z',y),(z,z',y) G Hi, and, as the measures are nonsingular, we must 
have Hi n S2 7^ 0- We have therefore established that (3(z, z\ y) = (3(z, z', y). 
To proceed, we define 

0{y)= J P(z,z',y)fi(y,dz)fi(y,dz'). 

We claim that (3(z,z',y) = (3(y) A-a.e. Indeed, note that 

J \P(z,z',y)- (3(y)\\(dz,dz',dy) 

< J \/3(z,z' ,y) - P(z,z' ,y)\(ji® /x® n® fj,)(y,dz,dz' ,dz,dz')P Y \dy) 

by Jensen's inequality, and we may restrict the integral on the right-hand 
side to H3, as this set has full measure. Thus, the left-hand side vanishes as 
claimed. 
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To complete the proof, let H4 be a set of A-full measure such that /3(z, z',y) = 
P(y) for all (z,z',y) G H4. Using Lemma 2.6, we can find a subset H§ C H4 
of A-full measure such that we have 

Qz,z',y(V n G H 5 for all n > 0) = 1 for all (z, z' , y) G H 5 . 

We now set H = Hif\ H 2 C\ H b . Then evidently (5{z, z\ y) = f3{y) = l3(@ n y) 
for all n > whenever (z,z',y) G H, and (3(z,z',y) < 2 as condition 3 of 
Theorem 2.3 holds for (z,z',y) G fl". The proof is easily completed. □ 

Let us now complete the proof of the implication 3 =>■ 1, 2 in Theorem 2.3. 
By the zero-two law, it suffices to show that condition 3 of Theorem 2.3 rules 
out the second possibility of Proposition 2.7. Assume that condition 3 of 
Theorem 2.3 holds, and apply the zero-two law with the set H obtained from 
Proposition 2.9. If the second possibility of Proposition 2.7 holds, then there 
is an y G Q Y and a sequence (z/., z' k , Q nk y) G H such that /3(zk, z' k ,Q nk y) — > 
2 as k -» 00. But by Proposition 2.9, (3(z k , z' k ,e nk y) = (3(zi y z' 1} e ni y) < 2 
for all > 1, which is a contradiction. Hence, the proof of Theorem 2.3 is 
complete. 

3. Weak ergodicity of conditional Markov processes. 

3.1. The hidden Markov model. Throughout this paper we will operate 
in the same canonical setting as in Section 2. In this section, however, we 
will initially give a different construction of the measure P which makes 
(X n ,Y n ) n £z a hidden Markov model; the signal process X n then plays the 
role of the unobserved component, while the observation process Y n is the 
observed component. Such hidden Markov structure is the usual setup in 
which nonlinear filtering problems are of interest. We will shortly see, how- 
ever, that hidden Markov models are Markov chains in random environments 
in disguise, so that the results of Section 2 apply. 

As before, the signal X n takes values in the Polish space E and the ob- 
servations Y n take values in the Polish space F. We proceed to construct a 
measure P on the canonical path space ($1,J-). The hidden Markov model 
consists of: 

1. A probability kernel P:E x B(E) -> [0, 1] . 

2. A probability measure tt on (E,B(E)) such that 

J P(z, A)Ti(dz) = ir(A) for all A G B{E). 

3. A probability kernel $:Ex B(F) -> [0, 1]. 
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We now construct P as follows. For every n € N, we can define the probability 
measure p( n ) on Fi_ n>n i as 

pW(A) = J I A (x,y)$(x{n),dy(n))---$(x(-n),dy(-n)) 

x P(x(n — l),dx(n)) ■ ■ ■ P(x(—n),dx(—n + l))ir(dx(— n)). 
Then P {n+1) \^ n] = P^,so that we can construct the probability measure 

P : [0, 1], Pl^-n.n] = p(n) for a11 n e N 

by the Kolmogorov extension theorem. Note that under P, the signal X n 
is a stationary Markov chain with transition probability kernel P(z, A) and 
stationary probability measure tt, while, conditionally on the signal, the 
observations are independent at different times and Y n has law <&(X n , ■). We 
also remark that the joint process (X n ,Y n ) n£ z is easily seen to be itself a 
stationary Markov chain. 

In addition to the probability measure P, we introduce the probability 
kernel P' : E x — > [0, 1] such that P z is the law of (X n ,Y n ) n >Q started at 
A"o = z [i.e., under P 2 , the signal (X n ) n >Q is a Markov chain with transition 
probability kernel P and initial measure Xq ~ 5 Z , the observations (Y n ) n >o 
are conditionally independent given the signal, and Y n has conditional law 
$(X n , •) given For any probability measure v on (E,B(E)), we define 
the probability measure 

P"(A) = J I A {x,y)P z (dx,dy)u(dz) for all A € T+. 

Note that P n is in fact the restriction of P to ■ 

We now introduce two assumptions on the hidden Markov model which 
will play an important role in our main results. 

Assumption 3.1 (Ergodicity). The following holds: 

\\P z (X n e ■) - tt||tv for vr-a.e. z 6 E. 

Assumption 3.2 (Nondegeneracy). There exists a probability measure 
(p on B{F) and a strictly positive measurable function g:Ex i ? ^]0,oo[ 
such that 

A) = J I A (u)g(z, u)ip(du) for all A G B(F), zeE. 

We do not automatically assume in the following that either of these 
assumptions is in force, but we will impose them explicitly where they are 
needed. 
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3.2. The conditional signal process. Despite that we have constructed 
the measure P in a rather different manner, the hidden Markov model in- 
troduced in the previous subsection is in fact a disguised Markov chain in 
a random environment in the sense of Section 2. This is established in the 
following lemma. 

Lemma 3.3. There exist probability kernels P x : E x OX x B(E) — > [0, 1] 
and /i : Q Y x B{E) — ► [0, 1], and a probability measure P Y on (Q Y ,J- Y ), such 
that the conditions of Section 2 are satisfied and the measure P constructed 
there coincides with the measure P constructed in the current section. In 
particular, 

P x {X n , Y o 9", A) = P(X n+1 G A\T X V T Y ) P-a.s., 
n{YoO n ,A)=P(X n eA\F Y ) P-a.s. 
for every A G B(E) and n G Z, and P Y = P\jty . 

Proof. Let us fix the measure P as defined in the current section. We 
will use this measure to construct P x , \i and P Y . Subsequently, denoting 
by P' the probability measure on T constructed from P x , /i and P Y in 
Section 2 (called P there), we will show that in fact P' = P. 

Set P Y = P|jpy, and let fl:Q Y x B(E) — > [0,1] be a regular conditional 
probability of the form P(Xq G ■\J tY ). Moreover, note that 

P(Xi G A\F X V T Y ) = P(X X G A\a(X ) V T Y ) P-a.s. 

by the Markov property of {X n ,Y n ) ne z; indeed, due to the Markov property 
the cr-fields F{i t oo\ an d F-i are conditionally independent given u{Xq,Yq), 
so that the claim follows directly from the elementary properties of the con- 
ditional expectation. We can therefore obtain a regular conditional prob- 
ability P X :E xn Y x B{E) [0, 1] of the form P(Xi G V T Y ) [i.e., 
P X (X ,Y,A) = P(Xi G A\F X V P-a.s. for every A G £(£)]. The reg- 
ular conditional probabilities exist by the Polish assumption [21], Theorem 
5.3. 

Note that it follows trivially from the stationarity of {X n ,Y n ) n ^i that P Y 
is invariant under 0. We now claim that for P Y -a,.e. y G Q Y , we have 

J P x (z,y,A)fl(y,dz)=j2{ey,A) for all A G B(E). 

To see this, note that as B(E) is countably generated, it suffices by a stan- 
dard monotone class argument to prove the claim for A in a countable 
generating algebra {E n } C B(E) such that B{E) = a{E n : n G N}. But note 
that for fixed n G N, 

J P x (z,Y,E n )fi(Y,dz) = E(P(X 1 G E n \F x Vf Y )\T Y ) = P(X 1 G E n \T Y ), 
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while P(Xi G E n \T Y ) = ft(Y o 6, E n ) follows from 

E(/(F){P(X £ E n \T Y ) o 9}) = E(f(Y o 0- 1 )P(X o G £„|:F y )) 

= E(/(yoe- 1 )/ £; „(Xo)) 
= E(/(y)/ En (x 1 )) 

for every bounded measurable / : O y — > R, where we have twice used the 
stationarity of P. As we must only verify equality for a countable collection 
{En), we can indeed find a set .ff € T Y of P y -full measure such that 

J P x (z, y, A)}i(y, dz) = //(%, A) for all A G B(£), y G £T. 

We now set fi(y, A) = fl(y, A) and P x (z, y, A) = P x (z, y, A) for all z G E, y G 
and A G <B(£) , and we set fi(y,A) = n(A), P x (z, y, A) = tt(A) whenever 
y £ H. Then \x and P x are still versions of their defining regular conditional 
probabilities and P x , fj,, P Y satisfy the conditions of Section 2. The various 
identities in the statement of the lemma follow from the stationarity of P in 
the same way as we established above that P(X X G E n \ T Y ) = p,(Y o Q, E n ). 

It remains to show that the measure P' constructed from P x , fj,, P Y as 
in Section 2 coincides with the measure P. It suffices to show that P'(-A) = 
P(A) for every A G !F\- njn \, n G N. To this end, note that for A G ^r_ njn i we 
evidently have 

P'(A) = J I A (x,y)P x (x(n-l),G n ~ 1 y,dx(n))--- 

x P x (x(-n), e~ n y, dx{-n + l))/x(e~"y, dx(-n))P Y (dy) 
= E(E(E(- • • ECE^I^ V J- y )|^ n x _ 2 V T Y ) ■ ■ ■ \T x n V T Y )\T Y )) 
= P(A). 

Thus, the proof is complete. □ 

From this point onward we will fix P x , fj,, P Y as defined in the previous 
lemma. In particular, this allows us to define the probability kernels P^ and 
P z y as in Section 2, and these are easily seen to be versions of the regular 
conditional probabilities P(-\J- Y ) and P^lJ 7 ^ V J- Y ), respectively. Under 
P y , the process (X n ) n ^z has the law of the signal process conditioned on 
the observations (Y n )ne1'i we wu l refer to this process as the conditional 
signal process. The main purpose of this section is to obtain a sufficient 
condition for the conditional signal to be weakly ergodic, that is, for any 
(hence all) of the conditions of Theorem 2.3 to hold in the current setting. 
In Sections 4-6, we will see that this question has important consequences 
for the asymptotic properties of nonlinear filters. 
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Intuitively, it seems plausible that the weak ergodicity of the conditional 
signal process is inherited from the ergodicity of the (unconditional) signal 
process, that is, that weak ergodicity of the conditional signal follows from 
Assumption 3.1. The counterexample in [1] illustrates, however, that this 
need not be the case. The following theorem, which is the main result of 
this section, shows that weak ergodicity of the conditional signal follows 
nonetheless if we also assume nondegeneracy of the observations (Assump- 
tion 3.2). 

Theorem 3.4. Suppose that both Assumptions 3.1 and 3.2 are in force. 
Then any (hence all) of the conditions of Theorem 2.3 hold true. 

The proof of this result is contained in the following subsections. 

3.3. Weak ergodicity of the conditional signal. The strategy of the proof 
of Theorem 3.4 is to show that condition 3 of Theorem 2.3 follows from 
Assumptions 3.1 and 3.2. In this subsection we prove that condition 3 of 
Theorem 2.3 follows from Assumption 3.1 and a certain absolute continuity 
assumption; that the latter follows from Assumptions 3.1 and 3.2 is estab- 
lished in the next subsection. 



Lemma 3.5. Suppose Assumption 3.1 holds, and that there is a strictly 
positive measurable function h:Ex Q Y x E — >]0, oo[ such that for fiP^ -a.e. 

(z,y), 

P x {z,y,A) = J I A (z)h(z,y,z)P(z,dz) for all A £ 13(E). 
Then condition 3 of Theorem 2.3 holds. 

Proof. First, we note that Assumption 3.1 implies that there is a set 
H\ of (fi (g) //)P y -full measure such that for any (z,z',y) G Hi, there is an 
n G N such that P z (X n G •) and P z (X n G •) are not mutually singular. To 
see this, note that 

J \\P z (X n G •) -P z '(X n G ■)\\ T y^y,dz) f i(y,dz')P Y (dy) 
<2 J \\P z {X n G •) - 7r\\ TY n(y,dz)P Y (dy) 

= 2 J \\P z (X n G •) - 7r||TV7r(cfe) 

by Assumption 3.1. But as \\P z {X n €-)-P z (X n G -)|| TV 1S nonmcreasmg 
and uniformly bounded, we find that ||P 2 (A n G •) — P z (X n G •) ||tv — > as 
n — > oo for (/i <8> fi)P Y -&.e. (z,z',y), which establishes the claim. 
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Now let H2 be a set of (mP -full measure such that the absolute continuity 
condition in the statement of the lemma holds true for all (z,y) G H2. By 
Lemma 2.6, there is a subset ^3 C H2 of ^P y -full measure such that for 
every (z,y) G H3 we have P 2jJ/ ((X„, Q n y) G H3 for all n > 0) = 1. It follows 
directly that for every (z,y) G H$, n G N and A G B(E), we have 

Pz, y (X n G A) = ~E z (h(X , y, X\) ■ ■ • h(X n ^ ® n - l y,X n )I A (X n )). 

In particular, P ZtV (X n G •) ~ P z {X n G •) for all (z,y) G iT 3 and n G N. 
To complete the proof, define the following set: 

H A = {{z,z',y):(z,z',y)eH 1 ,(z,y),(z',y)eH 3 }. 

Then H4 has (// ® /x)P y -full measure, and for every (z,z',y) G #4, there is 
an n G N such that P z>y (X n G •) and P z t y (X n G •) are not mutually singular. 
□ 



3.4. Nondegeneracy. Before we proceed, we will prove an elementary re- 
sult on regular conditional probabilities. The result generalizes the trivial 
identity 

^mbf = ~pm^t provided P{A n c) > 0i P{B nc)>0 

to regular conditional probabilities in Polish spaces. 

Lemma 3.6. Let G\, G2 and K be Polish spaces and set Vl = G\ x G2 x K. 
We consider a probability measure P on (Q,B(£l)). Denote by 71 : —> G\, 
72 : SI — > G2 and k:Q —> K the coordinate projections, and let Q\ , Q2 and K, 
be the a -fields generated by 71, 72 and k, respectively. Choose fixed versions 
of the following regular conditional probabilities (which exist by the Polish 
assumption): 

Sf (01, •) = P(« G -\gi)(gi), Hf 2 ( 9l , 92 , •) = P(k G V &)(<?!, <? 2 ), 

S?( 5 i,0 = P( 72 € 3^( 5l ,A;,-) = P(72 G V K)(gi,k), 

where g\ G G\, 32 G G2, k £ K . Suppose that there exists a nonnegative mea- 
surable function h:G\ x G2 x if — ► [0, 00 [ and a se£ H <zG\ x G2 such that 
E (Iff (71, 72)) = 1 and for every (gi,g 2 ) G if, 

3^(51,52,4) = J lA(k)h(g 1 ,g 2 ,k)E^(g 1 ,dk) for all AeK. 

Then there is an H' d G\ x K with E(/ff/(7i, «)) = 1 so t/iai /or aZ/ (51, fc) G 

^iK(9i,k,B) = / I B (g2)h(gi,g2,k)Zl(gi,dg 2 ) for all B G £ 2 - 
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Proof. We can evidently write (using the disintegration of measures 
[21], Theorem 5.4) for every A G Q lt B G Q 2 , and C G K 

P( 7l eA, 72 eB, K eC) 

= J Ia{9x)Ib (92)^12 (91 ,92,C)El(gi,dg 2 )Ei (dgi ) 
= J I A (g l )I c (k)E 2 1K (g 1 ,k,B)Ef(g l ,dk)E 1 (dg l ), 
where Hi is the law of 71 under P. Therefore, 
J E 2 1K (g 1 ,k,B)I A (g 1 )I c (k)Ef(g 1 ,dk)E 1 (dg 1 ) 

= J lB(92)h(gi,g2,k)El(g 1 ,dg 2 )lA(gi)Ic(k)E 1 K (g 1 ,dk)E 1 (dgi), 

where the exchange of integration order is permitted due to the nonnegativ- 
ity of the integrand. As this holds for every A&Qi and C G /C, we obtain 

^iK(9i,k,B) = J lB(92)h(gi,g2,k)El(gi,dg 2 ) for P-a.e. (gi,k) 

for every fixed B G Q 2 . But as Q 2 is countably generated, it suffices to verify 
that equality holds for B in a countable generating algebra, and we can thus 
eliminate the dependence of the exceptional set on B. □ 

To complete the proof of Theorem 3.4, we must show that the absolute 
continuity condition P x (z,y,-) ~ P(z, ■) of Lemma 3.5 holds. Recall that 
P(z, •) is a version of the regular conditional probability P(X\ G - \J- X ), while 
P x is a version of the regular conditional probability P(X\ G - \J- X V J- Y ). By 
the Markov property, however, it is immediate that we can also consider P to 
be a version of the regular conditional probability P(X\ G -\a(Xo)), and P x 
a version of the regular conditional probability P(Xi G -\a(Xo) V To 
prove absolute continuity, we will apply the previous lemma to the law of the 
triple (Xq, Xi, (Y^kyo) ■ In particular, to establish that P x (z,y,-) ~ P(z, •), 
we may equivalently investigate whether the laws of (lfc)fc>o under different 
initial conditions are equivalent. 

The following result, which is of independent interest, shows that — provided 
the observations are nondegenerate — two initial laws of the signal give rise 
to equivalent laws of the observations whenever the signal forgets the initial 
laws. This will be used below to establish that P x (z,y, •) ~ P(z, •). 

Lemma 3.7. Suppose Assumption 3.2 holds. Let v,v be probability mea- 
sures such that ||P^(X n G -)-P p (X n G -)||tv - 5= ^0. ThenP v \ T y ~V»\ t y . 
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Proof. We will work on the space fi' = E z + x E z + x F z + , where we 
write X n (x,x',y) =x(n), X' n (x,x',y) =x'(n), and Y n (x,x',y) =y(n). 

We make use of the well-known fact [24], Theorem III. 14. 10 and (III. 20. 7), 
that ||P"(X„ € •) — P u (X n G -)IItv -^0 as k^oo implies the existence of a 
successful coupling of the laws of (X n ) n >o under Y u and V u . We can thus 
construct a probability measure Q :B(E Z+ x E z+ ) — > [0, 1] such that: 

1. The law of (X n ) n >o under Q coincides with the law of (X n ) n >g under 

2. The law of (X' n ) n >o under Q coincides with the law of (X n ) n >a under 

P*; 

3. There is a finite random time r such that a.s. X n = X' n for all n > r. 

In addition, we define a probability kernel Q y : E z + x B(F Z +) -> [0, 1] such 
that (Y n ) n >o are independent under Q y (x, •) and Q Y (x,Y n G •) = <£(x(n), •). 
Now consider the following probability measures on 0,': 

Qi(A)= J I A (x,x',y)Cl Y (x,dy)Q(dx,dx'), 

Q 2 (A)= J I A (x,x',y)Q Y (x',dy)Q(dx,dx'). 

It is easily seen that P u \ t y = Qil-r-y and V v \ t y = Oil^r. To complete the 

proof, it therefore suffices to show that Qi ~ It is immediate, however, 
that 

37vk7 r = I I - f ttt — tttv whenever x(n) = x (n) for all n > N, 

dQ r {x,-) fc A = A g{x{k),y(k)) 

where g(z,y) is the observation density defined in Assumption 3.2. Thus, 
evidently 

Qi~Q 2 with — — = [[ fc . 

The proof is complete. □ 

We can now prove the following. 

Lemma 3.8. Suppose Assumptions 3.1 and 3.2 hold. Then there is a 
strictly positive measurable h:E x J7 y x E —>]0,oo[ such that for fiP Y -a.e. 



P x (z,y,A) = I A (z)h(z,y,z)P{z,dz) for all A e B(E). 
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Proof. By the Markov property, P and P are versions of the regular 
conditional probabilities P(-X"i G -|cr(Xo)) and P(-X"i G -^(A^o) V.F+), respec- 
tively. By the Polish assumption, we can also introduce regular conditional 
probabilities R : E X T% -> [0, 1] and R x : E x E x ^ -> [0, 1] of the form 
P(C5 / fc)fc>o S -|cr(X )) and P((F fe ) i .> G -|cr(Xo,Xi)), respectively. Applying 
Lemma 3.6 to the law of the triple {Xq,X%, (Yk)k>o)i it evidently suffices 
to show that there is a strictly positive measurable h:E x f2 y x E —>]0,oo[ 
such that 

H x (z,jz , ,j1) = J I A {y)h{z,y,z')R(z,dy) for all A e J% 

for Gff with P((X ,Xi) G H) = 1. 

By a well-known result on kernels ([15], Section V.58) there exists a non- 
negative measurable function h:E x il y x E — > [0,oo[ such that, for all 
z,z' G £, 

i? x (z,z',A) = J I A (y)h{z,y,z')R(z,dy) + R ± (z,z',A) for all ,4 G :F+ , 

where the kernel i?^ is such that R- L (z, z' ,■) _L R(z,-) for every 2, 2/ G -E 1 . 
Now suppose we can establish that R x (z, z' , •) ~ R(z, •) for (z, z') G H with 
P((X ,Xi) Gil) = 1. Then j? x (z, z', •) = for (z,z')eH, and h(z,y,z') > 
except on a null set. We can then set h(z,y,z') = 1 whenever h(z,y,z') = 
0, and set h(z,y,z') = h(z,y,z') otherwise; this gives a function h with 
the desired properties, completing the proof. It thus remains to show that 
R x (z, z', •) ~ R(z, •) for (z, z') G H with P((X ,Xi) G H) = 1. 

To this end, let us introduce convenient versions of the regular conditional 
probabilities R and R . Note that we may set 

/ fo(y(0))---f n (y(n))R X (z,z',dy) 

= J f (uMz,du)xE z '(f 1 (Y )---f n (Y n „ 1 )) 
for all bounded measurable fo, - ■ ■ ,f n an d n < 00. Similarly, we may set 

= J f (u)$(z,du) x J E i (f 1 (Y )---f n (Y n _ 1 ))P(z,dz) 

= J f (u)<S>(z,du) xE p W(/ 1 (F )---/n(^-i))- 
It thus suffices to show that 

P z '\fy ~P p ( z ^y Y for (z,z')£H with P((X ,Xi) G #) = 1. 
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By Assumption 3.2 and Lemma 3.7, it suffices to show that 
\\P z '(X n € •) - P P{z '- ] (Xn G OIItv 

for (z,z') G H with P((X ,Xi) € H) = 1. 

Now note that by Assumption 3.1, we may choose a set H\ of 7r-full 
measure such that ||P 2: (X r i € •) — 7t||tv ~~ > as n — > oo for all z 6 Hi. By 
Lemma 2.6, there is a subset i?2 C Hi of 7r-full measure such that for every 
z £ H2 we have P z (A n E H2 for all n > 0) = 1. In particular, for z, z' € H2, 
we then have 

\\P z \x n e-)-P p ^\x n e-)\\ TY 

<\\P Z '(X n £-)-TT\\ rV 

+ J \\P z "(X n e-)-n\kvP(z,dz")^^Q. 

But H = H2 x H2 satisfies P((Xq,Xi) 6 il) = 1 by construction. □ 

Combining Lemmas 3.5 and 3.8 now completes the proof of Theorem 3.4. 

4. Exchange of intersection and supremum of cr-fields. As is discussed 
in the Introduction and in the following sections, key to the asymptotic 
properties of nonlinear filters are certain identities for the observation and 
signal cr-fields. For example, key to the proof of total variation stability 
(Section 5) is the identity 

fVI" v ^U = ^ P-a.s., 

n>0 

and the goal of this section is to show that such identities hold under As- 
sumptions 3.1 and 3.2. The question can be seen as pertaining to the per- 
missibility of the exchange of the intersection and the supremum of cr-fields; 
indeed, under Assumption 3.1 the tail cr-field T x is P-a.s. trivial, so that 
the above identity can be written as 

fl V ^[n,oo[ = ^Vfl ^oo[ P-a-S. 

n>0 n>0 

The validity of such an exchange is a notoriously delicate problem [37] . 
For the sake of demonstration, we begin by proving the following lemma. 

Lemma 4.1. Suppose that any (hence all) of the conditions of Theorem 
2.3 are in force. Then the following holds true: 

n>0 n>0 
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The interest of this lemma is independent of the remainder of the paper; 
it follows directly from Theorem 2.3, and thus serves as a simplified demon- 
stration of the proof of the exchange of intersection and supremum property. 
Unfortunately, this result is not in itself of use in proving asymptotic prop- 
erties of nonlinear filters, as the entire observation field T Y appears in the 
expression rather than the positive and negative time observations J- Y and 
T Y . Using additional coupling and time reversal arguments, we will prove 
the following useful result. 

Theorem 4.2. Suppose that Assumptions 3.1 and 3.2 are in force. Then 

n>0 n>0 

The proof of Lemma 4.1 is given in Section 4.1 below, while the proof of 
Theorem 4.2 is contained in Sections 4.2-4.4. 

4.1. Proof of Lemma 4-1- in [37], von Weizsacker studied problems of 
this type in a general setting, and Lemma 4.1 can be derived from his result 
and Theorem 2.3. As the idea is straightforward, however, we give a direct 
proof here. 

Let us begin by proving the assertion 

r)F Y V^ M = F ¥ P-a-s. 

ra>0 

It suffices to show that, for every A £ 

PU\ fl T Y Vff^=P(A\T Y ) P-a.s. 

^ n>0 ' 

As bounded random variables of the form F(x,y) = f(x)g(y) are total in 
L 1 (P), it suffices to verify the statement for A £ J- only. By the martingale 
convergence theorem, it is sufficient to show that, for any A £ T x , 

V(A\T Y y^ lM )^P(A\T Y ) in L\P). 

We now appeal to the following fact: as J 7 ^^ is countably generated, we 
have 

P(A\T Y VJF£ i0o[ ) =P Y {A\T X M ) P-a.s. 

for any A € J- x , where we have used that (Lemma 2.2) Py(-) is a regular 
conditional probability of the form P(-\J- Y ); see [37], Lemma 4. ILL But 

P y (\P y (A\F x M )-P y (A)\)^>0 for P y -a.e. y 
follows by martingale convergence and the following lemma. 
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Lemma 4.3. Suppose that any (hence all) of the conditions of Theorem 



2.3 hold. Then the tail a-field T x is P„ -trivial for P Y -a.e. y. 



Proof. By condition 1 of Theorem 2.3, we find that 
/ \\?z,y(Xn S ■) -Py(X n G ■)\\ TV ^y,dz)P Y (dy) 

(X n G •) -P z > s y(X n G ■)\\ T y f i(y,dz , )fi{y,dz)P Y {dy) 

converges to zero as n —> oo. But as ||P^ ]?/ (X n G •) — P y (X n G -)IItv is non- 
increasing, we find that ||P Ziy (X n G •) — P y (X n G •) ||tv as n — > oo for 
//P y -a.e. (z,y). Note that by the Markov property of (X n ) n >o under P z ,yi 

\\P z , y (x n e-)-P y (x n €-)\\ TV 



-\\ F z,y\yrX ~Py\jrX || TV " °°> || P z ,y ~ F V 1 1 TV 

(see, e.g., [24], Section III. 20). Therefore, P z ,y\r x = Pj/lr x f° r /uP y -a.e. 
(z,y), and it remains to invoke condition 2 of Theorem 2.3. □ 

We can now easily complete the proof of f] n >o^F V F x ioo y = F P-a.s. 

Indeed, integrating with respect to P , we find by dominated convergence 
that 

P(|Py(A|^ iOo[ )-P y (A)|)-=^0 

and the result now follows directly. 

We now turn to the proof of the assertion 

P| T Y V T x n = T Y P-a.s. 

n>0 

As above, it suffices to show that, for every 

P{A\F Y V F x n )^^P{A\F Y ) inL^P). 
In fact, it suffices to establish only that 

E(f 1 (X kl )---MX ke )\F Y VF* n ) 

E(f 1 (X kl ).--f e (X ke )\F Y ) inL^P) 



n^oo 
> 



for all t < oo, fei, . . . , kt G Z, and bounded measurable functions fi,...,ft, as 
the family of functions of the form f\(X kl ) ■ ■ ■ fi(X k( ) is total in L l {T x , P). 
Now note that by the last property of Lemma 2.2, we can write 

E(fi(X kl ) ■ • ■ fiiXki)^ V T x n ) = Ex_„,y e- n (fi(X kl + n ) ■ ■ ■ fi(X ke+n )), 

E(fl(X kl ) ■ ■ ■ f£(X k( )\F Y ) =Ey o0 -n(/l(Xfe 1+n ) • • ■ fe(X ke + n ))- 
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Therefore, using the stationarity of P, we find that 
E(|E(A |^ y V^5j-E(A |J- y )|) 

= J |E z , y (A n ) -E y (A n )\v(y,dz)P Y (dy) 

< J |E 2j j / (A„) - F, z / t y(A n )\n(y, dz)fj,(y, dz')P Y (dy), 

where we have written A n = fi(Xk 1 + n ) ■ ■ ■ fl{Xk t +n) for simplicity. It follows 
(see, e.g., [24], Section III. 20) from the first condition of Theorem 2.3 that 
this expression converges to zero as n — > oo, and thus the claim is established. 

4.2. Time reversal. In order to apply the theory of Markov chains in 
random environments, it was important to condition the signal process on 
all observations T . Note that the conditional probability P(A"o E ) 
satisfies the property P(X G -\T Y ) o 9 n = P(X n G ■\J rY ) which was used 
repeatedly in Section 2; this property is not shared by the conditional prob- 
ability P{Xq e ■|-7 7 + )• An unfortunate consequence is that we obtain the 
triviality of T x under the regular conditional probability P(-|.F ), which 
leads to Lemma 4.1, rather than the triviality of T x under P(-|.F+ ), which 
would give (the first part of) Theorem 4.2. 

To prove Theorem 4.2, we must therefore eliminate the dependence of 
our results to date on the past observations. As we will see in the following 
subsections, this can be done provided that the signal is not only ergodic 
forward in time (as is guaranteed by Assumption 3.1) but also after time 
reversal; in essence, we aim to establish that the remote past of the signal 
does not depend on the present. In this subsection, we will show that this 
property in fact already follows from Assumption 3.1, so that no additional 
assumptions need to be imposed. 

In the following we will extend the definition of P 2 to negative times, that 
is, P z is a version of the regular conditional probability P(-|Ao). Note that 
the time reversed signal X n = A_ n is again a Markov chain under P and 
P z with stationary measure ir. The goal of this subsection is to prove the 
following result. 

PROPOSITION 4.4. Suppose that Assumption 3.1 holds. Then 
||P 2 (X_ n G •) - 7t|| T v for n-a.e. zeE. 

We will need the following lemma on regular conditional probabilities. 

Lemma 4.5. Let G be a Polish space. Denote by 71 : G x G — > G and 
72 : G x G — > G the coordinate projections and by Q\ and Q2 the a-fields 
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generated by 71 and 72, respectively. Consider a probability measure ir on 
(G,B(G)), and a probability measure P on (G x G,B(G x G)) such that 
the laws of 71 and 72 under P both equal tt. Denote by P\:G x B(G) — > 
[0, 1] and P2'.G x B{G) — » [0, 1] i/ie regular conditional probabilities of the 
form P(7i £ -|£/2) and P(72 £ 'l^i), respectively, and consider their Lebesgue 
decompositions 



where P -1 _L 7r ® 7r, -P^fV, •) -L ir and P2~{z, •) JL tTj and p,pi,P2 - G x G — > 
[0, 00 [ are measurable. Thenp(z, z') =pi(z, z') = ^2(2, z') for it® it -a. e. (z, z'). 

Proof. The existence of regular conditional probabilities follows from 
the Polish assumption, while the existence of measurable pi,P2 follows from 
[15], Section V.58. It also follows from [15], Sections V.56-58, that there 
exist 51,5*2 S B(G x G) such that (tt <g) tt)(5i) = (7T ® 7r)(5 2 ) = 1 and for 
7r-a.e. z, z', 



Now note that, by the disintegration of measures, we have for all A,B £ B(G) 



Now substitute in the Lebesgue decompositions of Pi and P2, and note that 



Therefore, P\ tt _L tt (g> tt and Pr r~n _L tt ® 7r. But by the uniqueness of the 
Lebesgue decomposition of P, this implies that 




I A {dz)I B (dz')p(z,z , )'!r(dz)Tr(dz , ) + P ± (A x 5) 










for all A,B € B(G), from which the result follows. □ 
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We can now prove Proposition 4.4. 

Proof of Proposition 4.4. Denote by f n (z,z') the density in the 
Lebesgue decomposition of P z (X n £ •) with respect to tt. Then by Assump- 
tion 3.1, 

J \f n ( z , z >)-l\K{dz)K{dz')^*V. 

In particular, there is a subsequence /* oo such that 

y \fn k ( z ,z') — l\ir(dz) fc ~ > °°> for 7r-a.e. z'. 

But by the previous lemma and by stationarity, f n (z,z') is also the density 
in the Lebesgue decomposition of P z (X- n € •) with respect to tt. It follows 
that \\P Z (X- nk 6 •) — vt||tv ^0 as A; — > oo for 7r-a.e. z'. But A n = X_ n 
is again Markov, so ||P* (X_ n G •) 

— 7t||tv is nonincreasing and the result 

follows. □ 

4.3. Equivalence of the initial measures. Let us begin by fixing a version 
fi + : f2 y x £>(£) — > [0, 1] of the regular conditional probability P(X Q G ■|^ r +)- 
We can then define a probability kernel P + : J7 y x — ► [0, 1] by setting 

P+(A) = J P z>y (A)» + (y,dz) for all AeF*,ye Q Y . 

It is not difficult to see that Py is a version of the regular conditional 
probability P(-|^ r ^); indeed, it suffices to note that by the Markov property 
P z ,y is a version of the regular conditional probability P(-\a(Xo) V .F+). We 
also recall that 

P y (A) = Jp Z)y (A)fi(y,dz) for all Ae^,yeQ Y 

is a version of the regular conditional probability P(-|^ ry ). 

Theorem 2.3 establishes that the tail cr-field T x is P^-a.s. trivial for P Y - 
a.e. y (Lemma 4.3). To demonstrate the first part of Theorem 4.2 along the 
lines of the proof of Lemma 4.1, however, we would have to show that T x 
is Py -a.s. trivial for P Y -a.e. y. The latter would follow from the former if 
we could show that P+ ~ P y for P y -a.e. y, and it evidently suffices to show 
that •) ~ •) for P y -a.e. y. The purpose of this subsection is to 

prove that this is indeed the case under Assumptions 3.1 and 3.2. In fact, 
we will prove the following stronger statement: •) ~ tt and fi(y, •) ~ 7T 

for P y -a.e. y. 

The easy part of the proof is contained in the following lemma. 
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Lemma 4.6. Suppose Assumptions 3.1 and 3.2 hold. Then there is a 
strictly positive measurable k + :QX x E — »]0, oo[ such that, for P Y -a.e. y G 

n Y , 

fj, + (y,A) = J I A (z)k + (y,z)7r(dz) for all A £ 13(E). 

Proof. By Lemma 3.6, it suffices to show that there exists a strictly 
positive measurable k + : x E — >]0, oo[ such that, for 7r-a.e. z G E, 



'■(B) = J I B (y)k + (y,z)P(dy) for all B G T; 



But this follows immediately from Lemma 3.7 and Assumptions 3.1 and 3.2. 
□ 

It remains to prove the corresponding result for \x. Though we will proceed 
along the same lines, the proof is complicated by the fact that Lemma 3.7 
only establishes equivalence for observations at positive times J-Y and not 
on the entire time interval T Y . We therefore set out to extend Lemma 3.7 
to^ y . 

Lemma 4.7. Under Assumptions 3.1 and 3.2, P z \j^y ~ P\r Y for ir-a.e. 

z. 

Proof. By the Markov property of the signal process, and 
are independent under P z . We can therefore estimate as follows: 

ll"P z l ~p z ' l ll 

\F x r ,\/J rX , \t\vt x JItv 

— n [n,oo[ —n [n,oo[ 

= P~ T X <8> P Z T x ~ P Z T x ® P 2 T x TV 

-n [n,oo[ -n [n,ao[ 

< ||P*V* -P Z V„ ||tv + IIp 2 !^ -^I^IItv 

= ||P Z (X„ n G •) - P*'(*-n G OUTV + l|P Z (^n G •) " P Z '(X n G -)||tV 

< \\P z (X_ n G •) - ttHtv + l|P 2 '(^-n G •) " 7t||tv 

+ \\P Z (X n G •) " Vt||tV + ||P 2 '(^n G •) " TtHtV- 

Here we have used the Markov property of X n and X n = X_ n , and the 
elementary identity ®V\ — \ii <g) i^Htv < — ^2 1 1 TV + \v\ — ^Htv- By 
Assumption 3.1 and Proposition 4.4, we now find that 

l|P Z |j^ \it x ~ P 2 \t x \it x 1 1 TV n ~*° c > for 7T ® 7r-a.e. (z, z'). 

— n [n,oof — n [n,oo[ 
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But then we have 



— ™ [ti,oo[ — tj [n,oo[ 

/\ n— >oo 



< / ||P z |tta- v tx — P 2 |ttx v:r x || TV 7r(dz') — >0 for 7r-a.e. z. 

J —n [n,oo[ —n [n,oo[ 

In particular, P and P z agree on the remote a-field for 7r-a.e. z: 

P z \ n x = P\ n x for 7r-a.e. z, K x = f] T x n V ^ )0o[ . 

n>0 

From this point onward, we fix an arbitrary z such that P z \n x = Pl^x. To 
complete the proof, it suffices to show that this implies P z \-pY ~ P\jty . 

To proceed, we note that the remote c-field TZ coincides with the tail 
(T-field of the one-sided sequence (X_ n , X n ) n >Q. We can therefore apply the 
maximal coupling theorem [24], Theorem III. 14. 10, to this sequence. In par- 
ticular, we find that we can construct a probability measure Q : B(E Z x 
E z ) -> [0, 1] such that: 

1. The law of (X n ) ne % under Q coincides with the law of (X n ) ne z under 

2. The law of {X' n ) n& i under Q coincides with the law of (X n ) nG z under P; 

3. There is a random time < r < oo such that a.s. X n = X' n for all \n\ > t. 

Here X n and X' n are the canonical coordinate processes on E z x The 
remainder of the proof now proceeds exactly as the proof of Lemma 3.7. □ 

We can now prove the equivalence of /j,(y, •) and tt. 

Lemma 4.8. Suppose Assumptions 3.1 and 3.2 hold. Then there is a 
strictly positive measurable k:Q Y x E — >]0, oo[ such that, for P Y -a.e. y £ 

V(y, A)= J I A {~z)k{y, z)ir(dz) for all A G B{E). 

Proof. By Lemma 3.6, it suffices to show that there exists a strictly 
positive measurable k : Q Y x E — >]0, oof such that, for 7r-a.e. z € E, 

P Z (B) = J I B (y)k(y,z)P(dy) for all 5ef y . 

But this follows immediately from Lemma 4.7 and Assumptions 3.1 and 3.2. 
□ 



The following corollary follows directly. 
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Corollary 4.9. Suppose that Assumptions 3.1 and 3.2 hold true. Then 
Pjjljrx ~ Pjjjrx f or P y -a-e. y £ O y . 
In particular, P+ \q-x ~ P y |r x f or P Y -a.e. y £ f2 y . 

4.4. Proof of Theorem 4-2. We begin by proving the first assertion 
n^ v ^,oo[ = ^ P-a-s. 

n>0 

This would follow exactly as in the proof of the first part of Lemma 4. 1 if we 
could show that T x is P+-a.s. trivial for P y -a.e. y. But this follows directly 
from Lemma 4.3 and Corollary 4.9, so the claim is established. 
We now turn to the second assertion 

f}tfvT* n = rf P-a.s. 

n>0 

Note that this assertion is precisely equivalent to the first assertion of the 
theorem after time reversal. But by Proposition 4.4, the reversed Markov 
chain X n = X__ n satisfies Assumption 3.1 whenever the forward chain X n 
does, and Assumption 3.2 is invariant under time reversal. Thus, it suffices 
to apply the first part of the theorem to the hidden Markov model obtained 
by replacing the forward transition kernel P(z, •) by the backward transition 
kernel P 2 (X_i £ •). This completes the proof. 

5. Total variation stability of the nonlinear filter. Let us begin with a 
brief reminder of elementary filtering theory. The purpose of nonlinear filter- 
ing is to compute conditional probabilities of the form P fl (X n £ •|-7 ry n ])- We 
will choose fixed versions of these regular conditional probabilities accord- 
ing to the following well-known lemma, whose proof we provide for future 
reference. 



Lemma 5.1. Suppose that Assumption 3.2 holds. For every probability 
measure n on B{E), we define a sequence of probability kernels Yl^-.OX x 
B{E) — > [0, 1] (n > 0) through the following recursion: 

f I A (z)g(z, y(n))P(z , ,dz)Hl^ 1 (y, dz') 



my, A) 



fg{z,y(n))P(zi,dz)K-i(y:dz<) 
JI A (z)g(z,y(0))v(dz) 



f g(z,y(0))ii(dz) 

where g is the observation density defined in Assumption 3.2. Then IT^ is 

[0,n] 



a version of the regular conditional probability P fJ- (X n £ -l^"^^) for every 



n>0. 
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Proof. Writing out the recursion, we find that 

™,„ A] _ W(g(X o ,y(0))---g(X n ,y(n))I A (X n )) 
n[V > ' W(g{X ,y(0))-9(X n ,y(n))) 
But note that, by construction, 

g(X ,Y )...g(X n ,Y n )- ^°' n] 



■>,n] 

Y 



so that by the Bayes formula H£(Y, A) = P^{X n £ j) P^-a.s. □ 

The filter stability problem can now be phrased as follows: under which 
conditions does the filter LT^ become independent of [i for large n? The 
main goal of this section is to give a precise answer to this question under 
Assumptions 3.1 and 3.2. To this end, we will prove the following theorem. 

Theorem 5.2. Suppose that Assumptions 3.1 and 3.2 hold. Then 
||n^-n^||TV^^0 P^-a.s. iff ||P^(X n ,£-)-^llTV^^0. 

The following corollaries are essentially immediate. 

Corollary 5.3. Suppose that Assumptions 3.1 and 3.2 hold, and call 
the probability measure fx stable if \\IL£ — II^||tv — > P^-a.s. as n — > oo. 
Then \i is stable whenever \x <C n , and 5 Z is stable for ir-a.e. z £ E. Moreover, 
stability holds for all fj, if and only if the signal process is Harris recurrent 
and aperiodic. 

Proof. The first two statements follow directly from Assumption 3.1, 
while the last statement follows from [30], Proposition 3.6, and the fact that, 
by assumption, the signal possesses a finite invariant measure ir. □ 

Corollary 5.4. Suppose that Assumptions 3.1 and 3.2 hold true. If we 
have ||P^(A n £ •) — vt||tv ~~ * 0, then \\H£ — LT^||tv ~~ * P-a.s. In particular, 
if 

||P"(X n £ •) - vtHtv ^> 0, \\V(X n £ •) - vtHtv ^> 0, 
we find that - n^|| TV -> P-a.s., V-a.s. and P v -a.s. 

Proof. Apply Lemma 3.7 and the triangle inequality. □ 

Corollary 5.5. Suppose that Assumption 3.2 holds and that the signal 
is Harris recurrent and aperiodic. Then ||LT^ — II£||tv — *■ P 7 -a.s. for all 
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Proof. It is well known that for Harris recurrent aperiodic Markov 
chains which possess a finite invariant measure ir, we have ||P^(A" n S •) — 
^IItv ~~ > as n — > oo for every probability measure ^ [29], Theorem 6.2.8. 
Therefore, Assumption 3.1 follows, and it remains to apply the previous 
corollary and Lemma 3.7. □ 

The remainder of this section is devoted to the proof of Theorem 5.2. 

5.1. Proof of Theorem 5.2: the case fj><^ir. We begin by proving stability 
of probability measures /j, that are absolutely continuous with respect to the 
stationary measure ir. Note that by Assumption 3.1 we have ||P M (X n , G 
•) — vtHtv - ^0 as n — > oo for any fj, <C vr. We will also need the following 
result. 

Lemma 5.6. Suppose that Assumption 3.2 holds true and that ^ tt. 
Then we have n^(y, •) <C •) for every y G Q Y , where 

dns _ „ s E((dii/dn)(x )\rX v££ i0or ) 

r-Y 

"[0,. 



dYil [Y ^ n) - E((^/*r)(X )|^ Bl ) 



Proof. That n^(y, •) < IT^(y, •) for every j/Gfi y can be read off di- 
rectly from the expression in the proof of Lemma 5.1. Now note that 



dP>* 



dP 



t,„ , dn dP 

[0,oo[ 



=«#«.: 



[(!.. 



•T [0,n] 



Moreover, it follows easily from Assumption 3.2 that 

P M I t y ~ P I t y for every n <E N. 

^[O.n] [0,n] 



Therefore, the conditional expectations P fJ, (X n € ^l-T^g n i) are P-a.s. uniquely 
defined and E(^(X )|J fr ^ n] ) > P-a.s. We obtain by the Bayes formula 

= E(I A (X n )(dy/dir)(Xo)\F Y n] ) 
E((^/dvr)(X )|^ n] ) 

_ E(I A (X n )E((d^/dn)(Xo)\a(X n ) V^ w] )|^ ra] ) 
E((^/d7r)(X )|Jf >n] ) 
Choose a measurable A n : f2 y x E 1 — > [0, oof such that 

E((d»/d*)(X )\a(X n )VFl n] ) 

E((d^)(X )|^ n] ) " A ^^ 



P-a.s. 
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Then evidently for every A E B(E) 

K(Y,A)= J I A (z)A n (Y,z)IF n (Y,dz) P-a.s. 

But as B{E) is countably generated, it suffices by a monotone class argument 
to restrict to A in a countable generating algebra, and we can therefore 
eliminate the dependence of the P-null set on A. It remains to note that 

E (X ) \Fl V ^ )Do[ ) = E (X ) | a(X n ) V 

by the Markov property, and the proof is complete. □ 

We immediately obtain the following corollary 



P-a.s. 



Corollary 5.7. Suppose Assumption 3.2 holds and [i <C ir. Then P- 



a.s. 



I n n - n nl|TV 

Eim^/dj^ 

B((dfi/dir)(X )\^) 



Proof. This follows directly from the identity 



K( y ,.)-ui( y ,-)\\ TY 

and the previous lemma. □ 



diii 



We can now complete the proof of Theorem 5.2 for the case \x <C tt. 

Lemma 5.8. Suppose Assumptions 3.1 and 3.2 hold and /i <C7r. Then 
||n£-n£|| T v^^0 P-a.s. 
and therefore also P^-a.s. as P^ <C P. 

Proof. We aim to establish the P-a.s. limit of the expression in Corol- 
lary 5.7. Note that the denominator satisfies 



Y 

[0,n] 



-.b(* W 



Y 



dP 



P-a.s. 



by martingale convergence. Moreover, P^j^y ~ Pj^y by Lemma 3.7 and 
Assumptions 3.1 and 3.2. Therefore, the P-a.s. limit of the denominator is 
P-a.s. strictly positive. It remains to establish convergence of the numerator. 



3G 



R. VAN HANDEL 



To this end, note that for any k € N we have P-a.s. 



[0,n] 



< 



E ( -r-{X )I^/ d7T )< :k (X 



V -^[n,oo| 



+ 



E 



df i 



^[O.n] 



c/tt 



(*o)/ (dfi/d,Tz)>k 



v ^,00 [ 



~~ E ^^(^oK(f^/d7r)>fc(^o) 



•Y 

[0,n] 



< 



E ( -Jz( x o)I(dn/dTT)<k{ x o] 



d-K 



E ( ~Jz( X 0)I(d^/d-K)<k{XQ / 



ft 



Y 

[0,n] 



+ E (^^(^o)^(d M /d7r)>fc(^o) 
+ E (^^(^o)^(d M /d7r)>fc(^o) 

In particular, setting for notational convenience 



ft+ V ^,oo| 



[0,n] 



E 



r/?r 



(^o)-f(d,u/*r)<fc(^o) 



E f^(^o)-^(d M /d7r)<fc(^0; 



S [0,r 



we find that the numerator R n satisfies 
/ / dn. 

R n = E 



E (>„> 



ft< 



[0,n] 



Y 

[0,n] 



dji 



< E (- M nl- ?7 [0,n]) + 2E ( -^( X o) I (dfi/dTT)>k( X 



ft< 



[0,n] 



But E(Af^|^ 7 j^ n j) — > P-a.s. as n — > 00 by Hunt's lemma [15], Theorem 

V.45, as < k for all n and — > P-a.s. as n — > 00 by martingale 
convergence and Theorem 4.2. Moreover, by martingale convergence and 
dominated convergence, 



lim sup lim sup E ( -J^ (X ) I(dn/dn)>k ( x o) 



k^oo n->oo 



ft, 



Y 

[0,n] 



P-a.s. 
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Therefore, the numerator converges to zero P-a.s., and the proof is complete. 
□ 

Remark 5.9. Along the same lines, one can prove the following result. 
Suppose that Assumptions 3.1 and 3.2 hold and that the relative entropy 
of /i with respect to ir is finite, that is, D(h\\tt) < oo. Then D(n^||n^) — > 
P-a.s. as n — > oo. We refer to [9] for further details on the role of relative 
entropy in filter stability. 

5.2. Proof of Theorem 5.2: the general case. We now devote our atten- 
tion to the case where fi is not necessarily absolutely continuous with respect 
to 7r. Let us begin by proving the only if part of the theorem. 

Lemma 5.10. Suppose that Assumptions 3.1 and 3.2 hold and that 
limsup ||P M (X n G •) — 7t||tv > 0. 

n-^oo 

Then we must have 

P /i (limsup||n^-n^|| T v = 0) < 1. 



Proof. Let P^(X n G •) = /i n + fi^r be the Lebesgue decomposition of 
P fl (X n G •) with respect to ir. In particular, fi n <C ir and fi^ _L ir, and there 
exists a set S n such that ir(S n ) = and /jl^(S^) = 0. We claim that 

limsup WP^iXn G •) — ttHtv > =► limsup P M (X„ G S n ) > 0. 



n— >oo n— >oo 



Indeed, by [28], Theorem 7.2, Assumption 3.1 and P M (A" n G S n ) — > as 
n — > oo would imply that \\P^(X n G •) — vt||tv -> as jh oo, which is a 
contradiction. 

Now note that it is easily established, using the expression in the proof 
of Lemma 5.1, that Assumption 3.2 implies n^(y, •) ~ ir for every y G 0, Y . 
Therefore, LT£ (y, S n )=0 for all y G OX , and we can estimate as follows: 

njjfo, = sw) - n-( y , 5 n )| < |TO, ■) - ui( y , -)[|tv- 

In particular, we find that 

p^x n g 5„) = E^(n^(y, 5„)) < e"(||d£(y; •) - w n {Y, -)|| T v) 

and we must therefore have 

limsup E^(||n£(y, •) - Ul(Y, OUtv) > o. 

n— >oo 

The proof is easily completed. □ 

It remains to prove the converse assertion. The idea is to reduce the 
general case to the case /i <C ir. To this end, we will need the following 
lemma. 
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Lemma 5.11. Suppose that Assumption 3.2 holds. Let p and p be prob- 
ability measures, and let p = p ac + p s be the Lebesgue decomposition of p 
with respect to p (i.e., p ac <C p and p s _L p). Choose S so that p(S) = 1 and 
p s (S) = 0. Then 

n£(y, A) = p^(x g s\^ n] )K(Y,A) + p»(x $ s\^ n] )iet(y,A) 

P^-a.s. for every A G B(E), where we have written v = p ac j p ac (E) and 
v 1 - = p s /p s (E). In particular, we obtain P^-a.s. the estimate 

K(Y, •) - K(Y, 0||tv < \\K(Y, ■) - K(Y, OIItv + 2P"(X £ 5|^ n] ). 

Proof. Note that du/dp = Is/p a c(E). By the Bayes formula, we thus 
have 

E^(/ s (Xo)^(XO|^ n] ) = E^(/ 5 (Xo)|^ n] )E^(/ A (X„)|^ n] ) P"_ a .s. 
Similarly, as dv^/dp = Is? / p s (E), we find that 

E^(/ 5 c(X )/A(Xn)|^ n ]) = E^(/sc^ P^-a.s. 

The first claim now follows by summing these expressions. To prove the sec- 
ond assertion, let Ik = {E\, . . . ,E^} be an increasing sequence of partitions 
of E as in the proof of Lemma 2.4. Then we can estimate 

J2\K(y,e$)-K(y,eI)\ 

< P»(X G S\J% >n] )j2 \K(Y,Ej[) ~ K(y,e*)\ 

l=\ 

+ P»(X i S\J%^ (Y, El) + UP(Y, £*)) 

l=\ 

< ]T \K(Y,E%) - K(Y,E%)\ + 2P^(X i S\J%^ P"-a.s. 

It remains to take the limit as A; — ► oo. □ 

Note that in this result v <C p by construction. In particular, presuming 
that Assumptions 3.1 and 3.2 hold true and that ||P' 1 (A' n G •) — 7t||tv ~~ ► 0, 
and substituting 7r for p, it is not difficult to establish using Lemmas 5.8 
and 3.7 that 

limsup K(Y, •) - K(Y, -)||tv < 2P^(X $ S\T\) P"-a.s. 
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We can therefore eliminate the absolutely continuous part of the initial mea- 
sure \x using the stability for the case [i <C vr (Lemma 5.8). However, the 
singular part leaves the residual quantity P M (Xo ^ S\T\), and it remains 
to eliminate this term. To resolve this problem, we will exploit the recur- 
sive property of the filter. Together with Lemma 5.10, the following result 
completes the proof of Theorem 5.2. 

Lemma 5.12. Suppose that Assumptions 3.1 and 3.2 hold and that 
limsup ||P M (X n G •) - 7t||tv = 0. 

n— >oo 

Then we must have 

limsup ||n£-n£|| T v = P^-a.s. 

Proof. Define the following probability kernels: 

T$(y,A)=rtA), T^y,A) = J I A (z)P(z', dz^^y, dz'). 
Then by Lemma 5.1, the filter satisfies the recursive property 

K +k (y,A) = Ill" {y >-\G n y,A) for all k,n G Z + ,y G Q Y ,A G B(E). 
In particular, we can write 

limsup ||n£(y,.)-D£(i/,-)||TV 

= iimsu P ||n^ (y '' ) (e n y, •) - ii[ l{y ''\e n y, oilxv for a11 n E z +- 

But from routine manipulations, it follows that, for any B G .Fro.ooh 
WilBoQ^l^^^XB) P^-a.s. 

Therefore, 

E" (limsup \\IL%(Y, •) - UUY, OIItvI^u-i] ' 

= E^(limsup||nf^(Y o 9", •) - nJ«^(Y o 6™, OIItvI^-i 

k~*oo 

= E^hmsupiin^^^y o g™, •) - np'V o e» OUtvI^-h 

k— >oo 

= E^^^AimsupiinJ-^'^y, •) - ul Uyr \Y, ■ 



k— >oo 



I TV 



y=Y 



y=Y 

P^-a.s., 



where we have used that T^(Y,-) is J r j^ n _ 1 j-measurable. 
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For the time being, let us fix a y G £l Y . Note that it is easily established, us- 
ing the expression in the proof of Lemma 5.1, that T£(y, •) ~ P p (X n G •) for 
every p, n, y. Denote by P fl (X n G •) = p n + p^ the Lebesgue decomposition of 
P fl (X n G •) with respect to it (i.e., p n <C vr and _L 7r), and choose 5" n such 
that ir(S n ) = 1 and ^ (S n ) = 0. Then clearly T£(y, •) = u n (y, •) + i/^(y, •) 
with 

!/„(!/, A) = T£(y, AnS n ), i£{y, A) = T£(y, A n S c n ) 

is the Lebesgue decomposition of T^(y, •) with respect to T^(y, •) [i.e., z^ n (y, 
•) <C T^(y, •) and i/„ (y, •) _L T^(y, •)]. By Lemma 5.11, we find that 

||nf^(y,.)-n?^(y,-)ll TV 

<||n^^(y,o-nJn^(F,oi| TV + 2P^^)(x ^5 n |^ fe] ) 

< \\ul M (Y, •) - nj(y; .)||tv + ||np'V, ■) - OIItv 

+ 2P T "^-)(X S„|^ fe] ) P T ^>-)-a.s. 

But f n (y, ■) *C 7r and T^(y, •) ~ 7r, so by Lemma 5.8 the first two terms on the 
right converge to zero as k — > oo P-a.s. We claim that this convergence also 
holds P T «^' )-a.s. Indeed, recall that T£(y, •) ~ P^(X n G •) := p n , so that 
the claim is established if we can show that P Pn \<py ~ P|jrv . But ||P Pn (Xjt G 

•) — 7t||tv = HP^pfn+fc £ •) — vr ||tv —> 0, so the claim follows from Lemma 
3.7. 

We have now established that, for every y G Q Y , 

E T "W(limsup||n£" (2/ ' -\Y, •) - U™''Hy, OUtv) < 2P T ^(X £ S n ). 
In particular, this implies that P^-a.s. 

E^(iimsup||n^(y,-) -njf(r,.)||Tvl^,„_i]) < 2P M (^n. i s n \^ n _ x] ) 

and, therefore, we have for all n G N 

E^(limsup ||n£(y, •) - UUY, -)||tv) < 2P p (X n $ S n ) = 2/^(£). 

But by the assumption that ||P M (X n G •) — 7t||tv — > 0, we must have fi^(E) — > 
as n — ► oo. Thus, the proof is complete. □ 
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6. Continuous time. 

6.1. The hidden Markov model in continuous time. Up to this point we 
have exclusively dealt with Markov chains and hidden Markov models in 
discrete time. In this section, we will prove analogous results for continuous 
time filtering models by reducing them to the discrete time setting. First, 
however, we must introduce the class of continuous time models in which 
we will be interested. 

We consider an .E-valued signal (£t)teR an d an F-valued observation 
(Vt)teM.> where E is a Polish space and F is a Polish topological vector 
space. We will realize these processes on the canonical path space = 
Q$ X W, where Q$ = D(R; E) and W = D(R; F) are, respectively, the Skoro- 

hod spaces of E- and F-valued cadlag paths. Denote by the Borel u-field 

on O, and we introdoee the natural nitrations %, 9?, % in eomplete anal- 
ogy with the discrete time case: 

Moreover, we define for intervals [s,t] (s < t) the a-fields 

?[s,t] = : r e t s ' P[s,t] = -rj s :re[s,t]} 

and we set 5*J S) t] = $f s t i V jpP t j . Finally, we define 

t>0 t>0 t>0 t>0 

The canonical shift is defined as 9 s (£,??)(i) = (£(s + t),r)(s + t) — rj(s)). 
The continuous time hidden Markov model now consists of the following: 

1. A probability kernel Q : E x —> [0, 1] such that, for every A G B(E), 

Q*(& G A\9 S ) = Q 6 (&- s € A) Q z -a.s. for all z G E, t > s > 0, 

and such that Q z (£;o = z) = 1 for all z G E. 

2. A probability measure tx such that 

j Q z (£ t £A)n(dz)=K(A) for all AeB(E),t>0. 

3. A probability kernel 3> : fi^ x ffl — > [0, 1] such that (rft)tes. has independent 
increments with respect to for every £ G Sl^ and such that 

y /^(e^)^^,^) = l>(9 s cf,yl) for all (Gfi^Ae £\ s£l. 
We assume, moreover, that is 5¥ .-. -measurable for every ^4 G 
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For any probability measure \i on 13(E), we define 
Q^A) = J Q z (A)ii(dz) for all A G 

Then under Q' 4 , the signal (£t)t>o is a time homogeneous Markov process 
with initial measure £o ~ I 1 - I n particular, under Q 71 " the signal is a stationary 
Markov process with stationary measure tt. We can therefore extend the 
measure Q* to two-sided time 5* in the usual fashion, and we denote this 
extended measure as Q. In particular, under Q the entire signal (£t)teM. is 
a stationary Markov process with stationary measure tt. We now define the 
probability measure P on jP as 

P(A) = J I A (Z, V )$(Z,d V )Q(dO for all A G 9 

and we similarly define the measures on fP*L V as 

pf>(A) = J iAi^vM^d^C^id^) for all AG ^.V^. 

Then P M defines the hidden Markov model with initial measure fi, while 
P defines the stationary hidden Markov model. Note that the stationary 
measure P is invariant under the canonical shift s by construction. 

We now introduce the continuous time counterparts of Assumptions 3.1 
and 3.2. 

Assumption 6.1 (Ergodicity). The following holds: 

||Q 2 (6 G •) -7t||tv for 7r-a.e. z G E. 

Assumption 6.2 (Nondegeneracy). There exists a probability measure 
(f on ffl and a family (Y, S j) s <t of strictly positive random variables such 
that 

*(£,A) = J I A (Tj)£ at t(£,Ti)<p(dTj) for a31Ae^ a ^€^,a<t, 

and such that E S)t is F M -measurable for every s < t. 

Our guiding example in which a kernel $ can be constructed that satisfies 
all the required conditions is the ubiquitous filtering model with white noise 
observations. Though none of our results rely specifically on this model, 
let us take a moment to show that it does indeed fit within our general 
framework. 
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Example 6.3 (White noise observations). Set F = W d for some d < oo, 
and let (p be the probability measure which makes (r] t )t£m. a two-sided 
(i-dimensional Wiener process. Such a probability measure is easily con- 
structed; indeed, let W be the canonical Wiener measure on C([0, oof; M. d ), 
and define the measurable function a : C([0, oo[;R d ) x C([0, oo[;R d ) -> L»(lR;]R d ) 
as 

if t < 0, 
,V+(t), if t > 0. 



Then = <g> W) o a" 1 . Note that </3 is invariant under the shift Q s . 
Let h:E — >M. d be a continuous function (the observation function), so 

that 1 1 — > ^(£t) is cadlag. By [22], we may define an ^-measurable map 

E s t so that 

^s,t(^ r t) = ex v(^J s H&) ■ d Vr ~ ^ \\h(£ r )\\ 2 dr\ for <p-&.e. r) G CP 

for every £ € f2^. Note that E Si t is strictly positive by construction. We now 
define for every s < t the probability kernel : x ^ — > [0, 1] as 

A) = y I A ( V )£ s , t (t;,v)<P(dv) for all 46^,,^^. 
Define the process 

/•r+s 

77 r = 77 r+s -rj s - h(£ u ) du. 



s 



Then by Girsanov's theorem, (?7r )re[o,t— s] is a standard (i-dimensional Wiener 
process under 5> S) t(£, •) for every £ G O^, as 1 1— > /i(£t) is cadlag and hence 
locally bounded (the usual conditions, which we have not assumed, are 
not needed for this to hold; see [38], Chapter 10). It remains to note that 
{^s,t(£j ■) : s < t} is a consistent family, so there exists a probability kernel 
$:^x^^ [0, 1] with 

A) = $ s ,t(£, A) for all A G 5£ >t] , £ G fi*, s < i, 

by the usual Kolmogorov extension argument. It is easily verified that <I> 
satisfies the required properties, and Assumption 6.2 holds true by con- 
struction. 

From this point onward we consider again the general continuous time 
setting (i.e., we do not assume white noise observations). The goal of this 
section is to extend several of our discrete time results to the continuous 
time setting. To this end, we will first prove the following counterpart of 
Theorem 4.2. 
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Theorem 6.4. Suppose that Assumptions 6.1 and 6.2 are in force. Then 
f]Pl\/a{^ s :s>t}=Pl and f| ? r > V ?i t = P$ P-a.s. 

We now turn to the filter stability problem. As in discrete time, we 
must choose suitable versions of the regular conditional probabilities (£t £ 

'\^[o,t])- 

Lemma 6.5. Suppose Assumption 6.2 holds. For any probability measure 
fi on B(E), define a family of probability kernels Hf-.D,^ x B{E) — > [0,1] 
(t >0)by 



Then fl^ is a version of the regular conditional probability P M (£i € - \^[ 
Proof. Apply the Bayes formula as in Lemma 5.1. □ 



'■Tj \ 

[0,t]> 



We can now prove a counterpart of Theorem 5.2. Note that the continuous 
time result yields a slightly weaker type of convergence than its discrete time 
counterpart; the reason for this choice is explained in the remark below. 



Theorem 6.6. Suppose that Assumptions 6.1 and 6.2 hold. Then 
Moreover, if 



E^(||n^-nf|| TV )-^o iff ||pM(& e .)_*|| w ±^ . 



||P"(& € ■) - *f||TV -^0 and llP^feeO-^llTV-^O, 
then E^(||nf - Uf |[ T v) ^0 as t -> oo. 

Remark 6.7. Theorem 5.2 yields almost sure convergence of the filter- 
ing error, while Theorem 6.6 only gives convergence in L . The subtlety lies 
in the fact that convergence results for stochastic processes in continuous 
time, such as the martingale convergence theorem, require the choice of a 
modification of the stochastic process with appropriate continuity proper- 
ties, and this typically requires that the nitrations satisfy the usual condi- 
tions (the associated cr-fields are therefore no longer countably generated). 
Though it seems very likely that such issues can be resolved with sufficient 
care, for example, along the lines of [39], we have chosen the simpler route 
which avoids unnecessary complications at the expense of a slightly weaker 
notion of convergence. 



The remainder of this section is devoted to the proofs of Theorems 6.4 
and 6.6. 
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6.2. Reduction to discrete time. The proofs in the continuous time set- 
ting can largely be reduced to our previous discrete time results. To this end, 
we begin by constructing a discrete time hidden Markov model, as defined in 
Section 3.1, which coincides with the continuous time model of this section. 

The signal and observation state spaces for our discrete model are taken 
to be E = D([0,1];E) and F = D([0, 1]; F), respectively (recall that these 
Skorokhod spaces are themselves Polish). For the discrete time signal we 
will choose the E- valued process X n = (£t)n<t<n+i, while we choose for the 
discrete time observations the F- valued process Y n = (rjt — f]n)n<t<n+i- We 
claim that these processes define a hidden Markov model in the sense of Sec- 
tion 3.1. Indeed, it is easily seen that X n is a Markov process with transition 
probability kernel 

P(£', A) = 6/« (&) <t<i G A) for all £' G E, A G B{E) 

and invariant measure 

■n{A) = P((&)o<t<i G A) for all A G B(E). 

On the other hand, given = J- x , the random variables Y n are independent 
(as rjt has conditionally independent increments given 5%) and we may define 

$((£(i))o<t<i , A) = !>(£, Y G A) for all £ G A G B(F), 
where we have used that <!(£, A) is 5^ ^ -measurable for A G 3^ ^ and that 

P(Y n g = $(£, y n g A) = l»(e n e, y G A) = $(X n , A). 

Having defined the kernels P and $ and the measure tt, we may now con- 
struct the process (X^, y n ) riG z on its canonical path space as in Section 3.1, 
and it is easily verified that the measures P and P M coincide with the law of 
the process (X n ,Y n ) under P and P M , respectively, where p, = P' 1 (Xq G •). 

Lemma 6.8. Assumption 6.1 implies Assumption 3.1 for the discrete 
chain. Similarly, Assumption 6.2 implies Assumption 3.2 for the discrete 
chain. 

Proof. By the Markov property, we find that 

HQ 2 ((£t)n<t<n+l G •) - Vr|| T y = ||Q Z (£n G •) — 7t||tV- 

But note also that 

Q z ((Ct)n<t<n+i G •) = P 5 '(X n+1 G •) for all £' G E with £'(1) = z. 

The first statement follows directly. To prove the second statement, it suffices 
to note that under Assumption 6.2 we can write for £ G £1^ 

$((£t)o<t<i,A) = / lA((vt ~ Vo)o<t<i)^o,x((^t)o<t<x, (vt - Vo)o<t<x)^(drj), 
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so we may set ip(A) = <p(Yo G A) and g(z,u) = £0,1(2, it). □ 

The proof of Theorem 6.4 now follows immediately. 

Proof of Theorem 6.4. The result follows immediately from Theo- 
rem 4.2 in view of the fact that the measures P and P coincide. □ 

Before we proceed, let us prove a continuous time counterpart of Lemma 
3.7. 

Lemma 6.9. Suppose Assumption 6.2 holds. Let v,v he probability mea- 
sures such that ||P iy (^G-)- pP (6e-)llTV^^0. Then P u \p v ~P p |p„. 

Proof. The result follows from Lemma 3.7, in view of the equivalence 
of the measures P^ and P^ (fl = P^(Xq £ •)) for any /1, using the same 
argument as in the proof of the first assertion of Lemma 6.8. □ 

6.3. Proof of Theorem 6.6. As in the discrete time setting, we begin by 
proving the only if part of Theorem 6.6. The proof is essentially identical. 

Lemma 6.10. Suppose that Assumptions 6.1 and 6.2 hold and that 
limsup ||P M (& G •) - ttIItv > 0. 

t— »oo 

Then we must have 

limsupE^(||flf - nf || TV ) > 0. 

t— >oo 

PROOF. Let P M (£ n £ •) = + A*n be the Lebesgue decomposition of 
P M (£n £ •) with respect to fr. In particular, fi n <C tt and fM~ _L tt, and there 
exists a set S n such that Tt(S n ) = and fi^S^) = 0. We claim that 

limsup ||P M (Ct G •) -tt||tv >0 => limsupP /x (C„GS'„)>0. 

t— >oo n^oo 

To see this, note that (£ n )nez + is a discrete time Markov chain on the state 
space E. By [28], Theorem 7.2, Assumption 6.1 and P^(^ n € S n ) — > as 
n — > 00 would imply that HP^^n G •) — 7t||tv —> as n — > 00. But ||P M (£t G 
•) — vt||tv is nonincreasing with i, so the latter implies that ||P^(^ G •) — 
7t||tv ~~ * as t — > 00. The claim is therefore established by contradiction. 

Now note that it is easily established, using the expression in the proof 
of Lemma 6.5, that Assumption 6.2 implies WL(r], ■) ~ tt for every n G (l v . 
Therefore, evidently 11^ (j?, S n ) = for all 77 G O 77 , and we can estimate as 
follows: 

D£fa, S n ) = \fl£( v , S n ) - ILl( V , s n )\ < \\n%(v, •) - K(v, ') IItv- 
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In particular, we find that 

P»(x n g s n ) = &(fi»((vt)o<t<n,s n )) <E^(||n£-n£|| TV ) 

and we must therefore have 

limsupE^(||n£-n*|| T v) >o. 

n— >oo 

The proof is easily completed. □ 

We now proceed to prove the converse assertion. One could attempt to 
adapt the corresponding discrete time proof to the current setting, but here 
we choose a different approach. First, we will show using Theorem 5.2 that 

||P M (6e-)-7r||TV-^^0 and ||P"(& € •) - 7r|| T v 
implies that 

ES-(|re - n*||Tv) o, 

where the limit as n — > oo is taken along the integers ra£l. In the second 
step, we will show that the function 

t^E^(||nf-nf|| TV )(tGR + ) 

converges to a limit when we let t — ► oo along the positive reals. Taken 
together, these two facts complete the proof of Theorem 6.6. 

Lemma 6.11. Suppose that Assumptions 6.1 and 6.2 hold and that 
||P^(6 eO-tfllTv-^^o and HP^eO-TfllTV-^^O. 
Then E^din^ - n*|| T v) fneNj. 

Proof. Let 11^ and LI^ be the filters for the discrete time chain as 
defined in Lemma 5.1, where fx = P^(Xo £ •). Note that, using the Markov 
property, we find that the condition of the current result implies that 

||pA(X ft 6.)-7T||TV-=^0. 

Therefore, by Assumptions 6.1 and 6.2, Lemma 6.8 and Theorem 5.2, we 
find that 

||n£-ILI|| TV ^^0 P*-a.s. 
It follows directly that 

linger, «i) e •) -n-(rm) e OIItv p "- a - s - 
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But note that II^(y,£(l) G •) and LL^(y,£(l) € •) are versions of the regular 
conditional probabilities 

P^ n+1 € -1^!]) and P(^ +1 G-|^ n+1] ), 

respectively. By the a.s. uniqueness of regular conditional probabilities and 
using Lemma 6.9 (which holds by virtue of Assumption 6.2), we therefore 
find that 

||n^-n*||Tv^>o p"-a.s. 

The result follows by dominated convergence. □ 

Lemma 6.12. Suppose that Assumption 6.2 holds and that 

||P M (& G •) - 7f|| T V and \\P v {^ t G •) - 7r|| T v 0. 

Then E y (||II^ — LT^Htv) is convergent as t — > oo (t G R+ ). 

Proof. Let p = {p + tt) /2. Then we can establish, exactly as in the 
proof of Lemma 5.6, that we have Ilf <C 11^ and LLJ <C with 



EP((dn/dp)(^)\PlvPf t 



[t,oo[ 



[0,*p 



^E, P'-a.s. 



^ " EP((d7f/dp)(£ )|£{S, 

W7dfi t 0=^(dnf/dn?) = : 

sequence / oo. By the martingale convergence theorem, we have P p -a.s. 



[o,t]J 

Note that W(dn%/dB$) = W(dILf /cffl£) = 1 for all i. Now fix an arbitrary 



d7T 



d7T 



as k —> oo. Moreover, these quantities are P p -a.s. strictly positive by Lemma 6.9. 
Applying again the martingale convergence theorem, we find that M£ : = 
dfit /dflt and Mjl := dllf /dfif converge P p -a.s. to the random variables 

E^((^/dp)(6)ia^v-^,oc[) 

and 

m _ = w{(dK/d P ){t,m t ?iv?i M ) 



EP((d*/dp)(Z )\?l) 
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respectively. Moreover, by the tower property of the conditional expec- 
tation, we have W(M p \9l) = 1 and E^M* |^) = 1 P>-a.s. Therefore, 
EP(AfM) = W(M*) = 1, so that M£ -> AP and Mf -> M f in L^P?) by 
Scheffe's lemma. 

Let us write, for simplicity, N k = \Mg - | and N = \M» - M*\. Then 
&{\&(N k \^ tk] )-&{N\Pl)\) 
<E'(|E'(iV fc [^ 0) ^ 

< &(\N k -N\) + W(\E<>(N\^ tk] ) - &{N\5fy\) 

< E P (|M^ - Mf - M M + M*|) + E p (|E p (iV|^ 0A] ) - E p (N\$l)\) 

< W{\M% - M»\) + E"(|Mf - M # |) + E p (|E^(Af|^ tfe] ) - E"(iV|j?|)|), 

where we have used the inverse triangle inequality to establish that \N k — 
N\ < \Mg - M% - + M # |. By the martingale convergence theorem and 
the convergence of M k and M k , the right-hand side of this expression con- 
verges to zero as k — > oo. But note that ||LT^ — iT|J|tv = E p (iVfc|^^ t ,) 
P p -a.s., so we have 

|[nf fc - ft| ||tv Ey(jV|£+) in L 1 (P P ). 

In particular, ||IK — 1TJ, ||tv converges to W(N\9l) in P p -probability. But 

||P"(e t €.)-P p (6e-)l|Tv 

< ±(||P"& G •) - P"(£ t G OIlTV + 11^(6 € •) - TtHtv) 

< ±(||P^ e ■) - *IItv + 2||P"(& g •) - frlliv) 0, 

so by Lemma 6.9 we find that ||n£ — IT^ ||tv converges to E"(JV|^) m f - 

probability. Thus, we have E v (||frf fe - nfJ| T v) -> E"(W(N\!Pl)) by domi- 
nated convergence. But as this holds for any sequence t k / oo, the result 
follows. □ 

7. On the result of Kunita and necessity of the ergodic condition. In 

Sections 5 and 6 we explored the consequences of our main results for the 
stability of nonlinear filters. Our results also have implications for other 
asymptotic properties of the filter, however, in particular for the uniqueness 
of the invariant measure as studied in [23]. The aim of this section is to 
briefly outline the connection with [23], and to compare our assumptions to 
those made in the work of Kunita. 
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Kunita's original paper [23] investigated the continuous time setting with 
compact signal state space and white noise type observations. His approach 
has been extended to locally compact [32] and Polish [2] signal state spaces 
on the one hand, and to discrete time models on locally compact [32] and 
Polish [17] signal state spaces on the other hand. None of these papers resolve 
the gap in [23], however; we refer to [4] for further discussion and references. 
For simplicity and concreteness, we will restrict our discussion below to the 
original setting of Kunita. However, the results in this paper apply to all 
settings considered in the above references, and the reader interested in the 
ergodic properties of the nonlinear filter can directly read off the relevant 
results from these papers. 

In [23], the signal process (£t)tgK is a stationary, time-homogeneous Feller- 
Markov process on a compact Polish state space E with stationary measure 
7f under P, and the Revalued observation process (rjt)t£«. is defined as 

Vt= I h(Cs)ds + W t , 
Jo 

where (Wt)teR is a two-sided Wiener process and h:E — s- M d is a continuous 
function. Kunita establishes that the filter H^, when seen as a measure- 
valued random process, is itself a Feller-Markov process, and we are inter- 
ested in the ergodic properties of this process. In particular, [23] yields the 
following statement. 

Proposition 7.1. There exists at least one invariant measure for the 

filter whose barycenter is tt. If Ploo-^o ^ P-t = -^o P- a - s - holds true, then 
there is only one such invariant measure. If in addition tt is the unique 
invariant measure of the signal, then the invariant measure of the filter is 
unique. 

In [23], it is assumed that the P-a.s. triviality of the tail <r-field rit>o-^t' 
or, equivalently [34], Proposition 3, the condition 

J |P*'(ft€A)-7f(A)|7f((te)-^=^0 for all A 6 B(E), 

is already sufficient to establish rit>o-^o V P-t = Pq P-a.s. As we have ar- 
gued before, however, this statement is not at all obvious. On the other 
hand, by the continuous time version of our main result (Theorem 6.4), it 
follows that 

[ sup \P s *(£ t eA)-ir(A)\Tr(dz)-^+0 

does in fact guarantee that n t > % V ?L t = % P-a.s. [that this condition is 
equivalent to Assumption 6.1 follows from the fact that ||P^(^ € •) — 7t||tv 
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is nonincreasing] . This condition covers most, but not all, of the models 
that satisfy Kunita's condition, and we have thus partially resolved the gap 
in his proof. Whether Kunita's condition is already sufficient to guarantee 
uniqueness of the invariant measure with barycenter tt remains an open 
problem. 

Besides sufficiency of the ergodic condition, it is interesting to ask whether 
such a condition is necessary for uniqueness of the invariant measure. Theo- 
rem 3.3 of [23] states that Kunita's condition is in fact necessary for unique- 
ness of the invariant measure with barycenter tt, but this does not appear 
to be correct. As the following example shows, neither our condition nor 
Kunita's condition is necessary. 

Example 7.2. Consider the signal on E = [0, 1] such that £t = £o for all 
iGR P-a.s., and let tt be the Lebesgue measure on [0,1]. We choose the 
observation function h(x) = x. This model fits entirely within the current 
setting. 

Let us first show that the signal does not satisfy Kunita's condition (and 
hence it does not satisfy our assumptions, which are stronger than Kunita's). 
Note that 

= cr{£ s : a < -t} = cr{£ } P-a.s. for all t G R. 

Therefore, P-a.s. f)t>o^-t = cr {£o}> which is certainly not P-a.s. trivial. 

We claim that nonetheless n t >o-^ vF t = 5J P-a.s., so the invariant 
measure of the filter with barycenter tt is unique. Clearly it suffices to show 
that 

Pi t = a{^ }cPS P-a.s. 
for all t > 0. But note that rjt = Cot + W t for all t G R, so 

limsup— = £o P-a.s. 

t— »— oo t 

The claim is therefore established. 



The previous example highlights a possibility which is not considered 
in this paper. Returning to our canonical model, suppose that the tail o- 
field T x is not P-a.s. trivial (so the signal is not ergodic), but that T x C 
^~\o oo[ P _a - S - Then, if it could somehow be established that the exchange of 
intersection and supremum is permitted, we would still obtain the identity 

f] ^%oo[ V -7>,oo[ = ^%oo[ V f] F ",oo[ = ^%,oo\ P-a.S., 
n>0 n>0 

and therefore also the associated implications for the stability properties and 
for the uniqueness of the invariant measure of the filter. The condition T x C 
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J-X r is closely related to the notion of detectability which is shown in [36] to 
be necessary and sufficient for the stability of the filter (in a suitable sense) 
for models with a finite signal state space and nondegenerate observations. 
Whether such a necessary and sufficient condition can be obtained for more 
general models in the absence of an ergodicity assumption is an interesting 
topic for further investigation. 
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